https://www.kaggle.com/competitions/spaceship-titanic/overview
In this competition your task is to predict whether a passenger was transported to an alternate dimension during the Spaceship Titanic's collision with the spacetime anomaly. To help you make these predictions, you're given a set of personal records recovered from the ship's damaged computer system.
File and Data Field Descriptions
train.csv - Personal records for about two-thirds (~8700) of the passengers, to be used as training data.
PassengerId - A unique Id for each passenger. Each Id takes the form gggg_pp where gggg indicates a group the passenger is travelling with and pp is their number within the group. People in a group are often family members, but not always.
HomePlanet - The planet the passenger departed from, typically their planet of permanent residence.
CryoSleep - Indicates whether the passenger elected to be put into suspended animation for the duration of the voyage. Passengers in cryosleep are confined to their cabins.
Cabin - The cabin number where the passenger is staying. Takes the form deck/num/side, where side can be either P for Port or S for Starboard.
Destination - The planet the passenger will be debarking to.
Age - The age of the passenger.
VIP - Whether the passenger has paid for special VIP service during the voyage.
RoomService, FoodCourt, ShoppingMall, Spa, VRDeck - Amount the passenger has billed at each of the Spaceship Titanic's many luxury amenities.
Name</i> - The first and last names of the passenger.
Transported - Whether the passenger was transported to another dimension. This is the target, the column you are trying to predict.
test.csv - Personal records for the remaining one-third (~4300) of the passengers, to be used as test data. Your task is to predict the value of Transported for the passengers in this set.
sample_submission.csv - A submission file in the correct format.
PassengerId - Id for each passenger in the test set.
Transported - The target. For each passenger, predict either True or False.
import pandas as pd
import numpy as np
import math
# Visualisierung
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(context='notebook', style='whitegrid', palette='muted')
# import MLs
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
from lightgbm import LGBMClassifier
from sklearn.model_selection import GridSearchCV
# import metrics
from sklearn.metrics import roc_curve
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import roc_auc_score
from sklearn.metrics import accuracy_score
# Datavorbereitung & -transformation
from sklearn.model_selection import train_test_split
from sklearn.compose import make_column_selector
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import make_pipeline
data = pd.read_csv("train.csv", sep=',', engine='python')
data.head()
| PassengerId | HomePlanet | CryoSleep | Cabin | Destination | Age | VIP | RoomService | FoodCourt | ShoppingMall | Spa | VRDeck | Name | Transported | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0001_01 | Europa | False | B/0/P | TRAPPIST-1e | 39.0 | False | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | Maham Ofracculy | False |
| 1 | 0002_01 | Earth | False | F/0/S | TRAPPIST-1e | 24.0 | False | 109.0 | 9.0 | 25.0 | 549.0 | 44.0 | Juanna Vines | True |
| 2 | 0003_01 | Europa | False | A/0/S | TRAPPIST-1e | 58.0 | True | 43.0 | 3576.0 | 0.0 | 6715.0 | 49.0 | Altark Susent | False |
| 3 | 0003_02 | Europa | False | A/0/S | TRAPPIST-1e | 33.0 | False | 0.0 | 1283.0 | 371.0 | 3329.0 | 193.0 | Solam Susent | False |
| 4 | 0004_01 | Earth | False | F/1/S | TRAPPIST-1e | 16.0 | False | 303.0 | 70.0 | 151.0 | 565.0 | 2.0 | Willy Santantines | True |
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 8693 entries, 0 to 8692 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PassengerId 8693 non-null object 1 HomePlanet 8492 non-null object 2 CryoSleep 8476 non-null object 3 Cabin 8494 non-null object 4 Destination 8511 non-null object 5 Age 8514 non-null float64 6 VIP 8490 non-null object 7 RoomService 8512 non-null float64 8 FoodCourt 8510 non-null float64 9 ShoppingMall 8485 non-null float64 10 Spa 8510 non-null float64 11 VRDeck 8505 non-null float64 12 Name 8493 non-null object 13 Transported 8693 non-null bool dtypes: bool(1), float64(6), object(7) memory usage: 891.5+ KB
# Den Prozentsatz der leeren Werte für jede Spalte
data.isnull().sum()
d = {
'Column': data.isnull().sum().index,
'Num Nulls': data.isnull().sum().values,
'% Nulls': ((data.isnull().sum().values / data.shape[0]) * 100).round(2)}
nulls_df = pd.DataFrame(data=d)
nulls_df
| Column | Num Nulls | % Nulls | |
|---|---|---|---|
| 0 | PassengerId | 0 | 0.00 |
| 1 | HomePlanet | 201 | 2.31 |
| 2 | CryoSleep | 217 | 2.50 |
| 3 | Cabin | 199 | 2.29 |
| 4 | Destination | 182 | 2.09 |
| 5 | Age | 179 | 2.06 |
| 6 | VIP | 203 | 2.34 |
| 7 | RoomService | 181 | 2.08 |
| 8 | FoodCourt | 183 | 2.11 |
| 9 | ShoppingMall | 208 | 2.39 |
| 10 | Spa | 183 | 2.11 |
| 11 | VRDeck | 188 | 2.16 |
| 12 | Name | 200 | 2.30 |
| 13 | Transported | 0 | 0.00 |
## Verteilung der unbekannten Werte in Abhängigkeit vom Wert der Zielvariablen
columns=[]
transported=[]
not_transported=[]
for col in data.columns:
if data[col].isnull().sum() > 0:
tmp = data.loc[data[col].isnull(), ['Transported']].groupby('Transported').size()
columns.append(col)
transported.append(round(tmp[True] / tmp.values.sum() * 100, 2))
not_transported.append(round(tmp[False] / tmp.values.sum() * 100, 2))
d = {'Columns' : columns, 'transported %' : transported, 'not_transported %': not_transported }
null_distrib_df = pd.DataFrame(data=d)
null_distrib_df
| Columns | transported % | not_transported % | |
|---|---|---|---|
| 0 | HomePlanet | 51.24 | 48.76 |
| 1 | CryoSleep | 48.85 | 51.15 |
| 2 | Cabin | 50.25 | 49.75 |
| 3 | Destination | 50.55 | 49.45 |
| 4 | Age | 50.28 | 49.72 |
| 5 | VIP | 51.23 | 48.77 |
| 6 | RoomService | 45.86 | 54.14 |
| 7 | FoodCourt | 54.10 | 45.90 |
| 8 | ShoppingMall | 54.81 | 45.19 |
| 9 | Spa | 49.73 | 50.27 |
| 10 | VRDeck | 52.13 | 47.87 |
| 11 | Name | 50.50 | 49.50 |
# nulls in Zeilen
tmp = data.isnull().sum(axis=1)
nulls_in_rows = pd.DataFrame(data = {'row' : tmp.index, 'null_num' : tmp.values})
nulls_in_rows.groupby('null_num', as_index=False).size()
| null_num | size | |
|---|---|---|
| 0 | 0 | 6606 |
| 1 | 1 | 1867 |
| 2 | 2 | 203 |
| 3 | 3 | 17 |
# Zeile mit nulls anschauen
data.iloc[data.isnull().sum(axis=1).gt(2).values, :]
| PassengerId | HomePlanet | CryoSleep | Cabin | Destination | Age | VIP | RoomService | FoodCourt | ShoppingMall | Spa | VRDeck | Name | Transported | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1203 | 1284_01 | Mars | True | F/247/S | NaN | NaN | False | 0.0 | NaN | 0.0 | 0.0 | 0.0 | Hal Knité | True |
| 2639 | 2822_02 | Earth | NaN | G/450/S | TRAPPIST-1e | 5.0 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | NaN | Salley Harverez | False |
| 2762 | 2970_01 | Earth | False | NaN | TRAPPIST-1e | NaN | False | 740.0 | 82.0 | 6.0 | NaN | 1.0 | Dwin Adkinson | False |
| 3072 | 3315_01 | Earth | NaN | F/627/S | TRAPPIST-1e | 15.0 | False | 10.0 | 0.0 | 99.0 | NaN | 2031.0 | NaN | False |
| 3535 | 3790_01 | NaN | True | G/620/P | TRAPPIST-1e | 13.0 | False | 0.0 | NaN | 0.0 | NaN | 0.0 | Trick Meyersones | True |
| 3882 | 4167_01 | Earth | False | NaN | PSO J318.5-22 | NaN | NaN | 0.0 | 440.0 | 0.0 | 0.0 | 334.0 | Ninaha Deckerson | False |
| 4164 | 4446_05 | Europa | NaN | B/175/S | TRAPPIST-1e | 33.0 | False | 0.0 | 4017.0 | NaN | NaN | 2260.0 | Phah Chocaters | True |
| 4548 | 4840_01 | NaN | True | F/915/S | TRAPPIST-1e | 36.0 | False | 0.0 | 0.0 | NaN | 0.0 | 0.0 | NaN | True |
| 5208 | 5555_01 | NaN | False | G/896/S | NaN | 43.0 | NaN | 1.0 | 0.0 | 213.0 | 7.0 | 701.0 | Winia Blanglison | True |
| 5409 | 5777_01 | Earth | NaN | F/1199/P | PSO J318.5-22 | 46.0 | NaN | 559.0 | 25.0 | NaN | 22.0 | 765.0 | Katen River | False |
| 5806 | 6141_02 | Earth | False | NaN | NaN | 21.0 | False | 28.0 | 0.0 | 0.0 | 662.0 | 0.0 | NaN | False |
| 6057 | 6405_02 | Earth | NaN | NaN | NaN | 2.0 | False | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | Feline Toddleton | True |
| 6112 | 6451_01 | Mars | False | NaN | NaN | 37.0 | False | 610.0 | NaN | 410.0 | 148.0 | 14.0 | Carkes Panad | False |
| 6904 | 7314_01 | Europa | NaN | C/237/P | NaN | 33.0 | False | 0.0 | 0.0 | 0.0 | NaN | 0.0 | Aldun Venticable | True |
| 7019 | 7472_01 | Mars | True | F/1426/S | TRAPPIST-1e | NaN | False | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | True |
| 7211 | 7703_02 | NaN | True | G/1251/S | TRAPPIST-1e | 13.0 | False | 0.0 | 0.0 | NaN | 0.0 | 0.0 | NaN | True |
| 7682 | 8202_03 | NaN | True | C/306/S | TRAPPIST-1e | NaN | False | 0.0 | 0.0 | 0.0 | 0.0 | NaN | Charga Unkcatted | True |
# den Inhalt der Object Spalten anzeigen
category_features = data.select_dtypes(include="object").columns
for c in category_features:
if c != 'name':
print(f"Column: '{c}'",)
print(data.loc[:, c].unique())
Column: 'PassengerId' ['0001_01' '0002_01' '0003_01' ... '9279_01' '9280_01' '9280_02'] Column: 'HomePlanet' ['Europa' 'Earth' 'Mars' nan] Column: 'CryoSleep' [False True nan] Column: 'Cabin' ['B/0/P' 'F/0/S' 'A/0/S' ... 'G/1499/S' 'G/1500/S' 'E/608/S'] Column: 'Destination' ['TRAPPIST-1e' 'PSO J318.5-22' '55 Cancri e' nan] Column: 'VIP' [False True nan] Column: 'Name' ['Maham Ofracculy' 'Juanna Vines' 'Altark Susent' ... 'Fayey Connon' 'Celeon Hontichre' 'Propsh Hontichre']
for_visual_df = data.copy()
Gemäß der Beschreibung des ursprünglichen Datensatzes die Informationen in den Spalten: 'Cabin', 'PassengerId', 'Name' in mehrere Spalten aufteilen
--> Cabin: 'CabinDeck', 'CabinNum', 'CabinSide'
for_visual_df[['CabinDeck', 'CabinNum', 'CabinSide']] = for_visual_df['Cabin'].str.split('/', expand=True)
--> PassengerId: 'GroupId', 'NumInGroup'
for_visual_df[['GroupId', 'NumInGroup']] = for_visual_df['PassengerId'].str.split('_', expand=True)
# numerische Spalten in den Typ float umwandeln
for_visual_df['CabinNum'] = for_visual_df.loc[:,'CabinNum'].astype(float)
for_visual_df['GroupId'] = for_visual_df.loc[:,'GroupId'].astype(float)
--> Name: FirstName, LastName
for_visual_df[['FirstName', 'LastName']] = for_visual_df['Name'].str.split(' ', expand=True)
Auf der Grundlage der vorhandenen Daten weitere neue Spalte erstellen
--> 'GroupSize' (die Anzahl der Personen in jeder Gruppe)
GroupSize_df = for_visual_df.groupby('GroupId', as_index=False).agg(
GroupSize = ('PassengerId', 'count'))
for_visual_df = for_visual_df.merge(GroupSize_df, how='left', left_on='GroupId', right_on='GroupId')
for_visual_df.head(5)
| PassengerId | HomePlanet | CryoSleep | Cabin | Destination | Age | VIP | RoomService | FoodCourt | ShoppingMall | ... | Name | Transported | CabinDeck | CabinNum | CabinSide | GroupId | NumInGroup | FirstName | LastName | GroupSize | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0001_01 | Europa | False | B/0/P | TRAPPIST-1e | 39.0 | False | 0.0 | 0.0 | 0.0 | ... | Maham Ofracculy | False | B | 0.0 | P | 1.0 | 01 | Maham | Ofracculy | 1 |
| 1 | 0002_01 | Earth | False | F/0/S | TRAPPIST-1e | 24.0 | False | 109.0 | 9.0 | 25.0 | ... | Juanna Vines | True | F | 0.0 | S | 2.0 | 01 | Juanna | Vines | 1 |
| 2 | 0003_01 | Europa | False | A/0/S | TRAPPIST-1e | 58.0 | True | 43.0 | 3576.0 | 0.0 | ... | Altark Susent | False | A | 0.0 | S | 3.0 | 01 | Altark | Susent | 2 |
| 3 | 0003_02 | Europa | False | A/0/S | TRAPPIST-1e | 33.0 | False | 0.0 | 1283.0 | 371.0 | ... | Solam Susent | False | A | 0.0 | S | 3.0 | 02 | Solam | Susent | 2 |
| 4 | 0004_01 | Earth | False | F/1/S | TRAPPIST-1e | 16.0 | False | 303.0 | 70.0 | 151.0 | ... | Willy Santantines | True | F | 1.0 | S | 4.0 | 01 | Willy | Santantines | 1 |
5 rows × 22 columns
--> 'TotalSpend' (Gesamtbetrag der von den Passagieren an Bord des Schiffes ausgegebenen Gelder)
for_visual_df['TotalSpend'] = for_visual_df.loc[:,
['RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck']].sum(axis=1, min_count=5)
--> 'HomePlanet' + 'Destination' = 'Route'
for_visual_df['Route'] = for_visual_df['HomePlanet'] + ' - ' +for_visual_df['Destination']
# die Anzahl der Kabinen in jeder Gruppe
for_visual_df.groupby(['GroupId'],as_index=False).agg(cabins=('CabinNum', lambda x: len(x.unique()))).sort_values('cabins', ascending=False)
| GroupId | cabins | |
|---|---|---|
| 1388 | 2092.0 | 4.0 |
| 5459 | 8129.0 | 4.0 |
| 1132 | 1709.0 | 4.0 |
| 5814 | 8668.0 | 3.0 |
| 2611 | 3911.0 | 3.0 |
| ... | ... | ... |
| 2155 | 3217.0 | 1.0 |
| 2153 | 3215.0 | 1.0 |
| 2152 | 3213.0 | 1.0 |
| 2151 | 3212.0 | 1.0 |
| 6216 | 9280.0 | 1.0 |
6217 rows × 2 columns
--> 'IsSinge' (reist die Person allein oder in einer Gruppe oder?)
for_visual_df['IsSingle'] = for_visual_df.loc[:, 'GroupSize'].apply(lambda x: x==1 )
--> 'NoSpend' (der Passagier hat keine Ausgaben an Bord des Schiffes)
for_visual_df['NoSpend'] = for_visual_df.loc[:, 'TotalSpend'].apply(lambda x: np.nan if math.isnan(x) else x==0)
--> 'IsChild' (Ist der Passagier minderjährig?)
for_visual_df['IsChild'] = for_visual_df.loc[:, 'Age'].apply(lambda x: np.nan if math.isnan(x) else x < 18)
--> 'namesakes_num_in_group' (Anzahl der Namensvettern in der Gruppe)
namesakes_in_group_df = for_visual_df.groupby(['LastName', 'GroupId'], as_index=False).agg(
namesakes_num_in_group = ('PassengerId', 'count'))
for_visual_df = for_visual_df.merge(
namesakes_in_group_df, how='left',
left_on=['LastName', 'GroupId'],
right_on=['LastName', 'GroupId'])
for_visual_df['namesakes_num_in_group'] = for_visual_df['namesakes_num_in_group'] - 1
# #2. число разных фамилий в группе
# namesakes_in_group_df = for_visual_df.groupby(['GroupId'], as_index=False).agg(
# namesakes_num_in_group = ('LastName', lambda x: len(x.unique())))
# for_visual_df = for_visual_df.merge(namesakes_in_group_df,
# how='left',
# left_on=['LastName', 'GroupId'],
# right_on=['LastName', 'GroupId'])
# for_visual_df['namesakes_num_in_group'] = for_visual_df['namesakes_num_in_group'] - 1
# #3. число кабин с однофамильцами
# cabins_with_namesakes_df = namesakes_in_cabin_df.groupby(['LastName'], as_index=False).agg(
# cabins_num_with_namesakes = ('CabinNum', 'count'))
# for_visual_df = for_visual_df.merge(cabins_with_namesakes_df, how='left', left_on='LastName', right_on='LastName')
# for_visual_df['cabins_num_with_namesakes'] = for_visual_df['cabins_num_with_namesakes'] - 1
--> 'NameLength' (die Anzahl der Zeichen im Passagiernamen)
for_visual_df['NameLength'] = for_visual_df.loc[:, 'Name'].str.len()
for_visual_df.head(10)
| PassengerId | HomePlanet | CryoSleep | Cabin | Destination | Age | VIP | RoomService | FoodCourt | ShoppingMall | ... | FirstName | LastName | GroupSize | TotalSpend | Route | IsSingle | NoSpend | IsChild | namesakes_num_in_group | NameLength | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0001_01 | Europa | False | B/0/P | TRAPPIST-1e | 39.0 | False | 0.0 | 0.0 | 0.0 | ... | Maham | Ofracculy | 1 | 0.0 | Europa - TRAPPIST-1e | True | True | False | 0.0 | 15.0 |
| 1 | 0002_01 | Earth | False | F/0/S | TRAPPIST-1e | 24.0 | False | 109.0 | 9.0 | 25.0 | ... | Juanna | Vines | 1 | 736.0 | Earth - TRAPPIST-1e | True | False | False | 0.0 | 12.0 |
| 2 | 0003_01 | Europa | False | A/0/S | TRAPPIST-1e | 58.0 | True | 43.0 | 3576.0 | 0.0 | ... | Altark | Susent | 2 | 10383.0 | Europa - TRAPPIST-1e | False | False | False | 1.0 | 13.0 |
| 3 | 0003_02 | Europa | False | A/0/S | TRAPPIST-1e | 33.0 | False | 0.0 | 1283.0 | 371.0 | ... | Solam | Susent | 2 | 5176.0 | Europa - TRAPPIST-1e | False | False | False | 1.0 | 12.0 |
| 4 | 0004_01 | Earth | False | F/1/S | TRAPPIST-1e | 16.0 | False | 303.0 | 70.0 | 151.0 | ... | Willy | Santantines | 1 | 1091.0 | Earth - TRAPPIST-1e | True | False | True | 0.0 | 17.0 |
| 5 | 0005_01 | Earth | False | F/0/P | PSO J318.5-22 | 44.0 | False | 0.0 | 483.0 | 0.0 | ... | Sandie | Hinetthews | 1 | 774.0 | Earth - PSO J318.5-22 | True | False | False | 0.0 | 17.0 |
| 6 | 0006_01 | Earth | False | F/2/S | TRAPPIST-1e | 26.0 | False | 42.0 | 1539.0 | 3.0 | ... | Billex | Jacostaffey | 2 | 1584.0 | Earth - TRAPPIST-1e | False | False | False | 1.0 | 18.0 |
| 7 | 0006_02 | Earth | True | G/0/S | TRAPPIST-1e | 28.0 | False | 0.0 | 0.0 | 0.0 | ... | Candra | Jacostaffey | 2 | NaN | Earth - TRAPPIST-1e | False | NaN | False | 1.0 | 18.0 |
| 8 | 0007_01 | Earth | False | F/3/S | TRAPPIST-1e | 35.0 | False | 0.0 | 785.0 | 17.0 | ... | Andona | Beston | 1 | 1018.0 | Earth - TRAPPIST-1e | True | False | False | 0.0 | 13.0 |
| 9 | 0008_01 | Europa | True | B/1/P | 55 Cancri e | 14.0 | False | 0.0 | 0.0 | 0.0 | ... | Erraiam | Flatic | 3 | 0.0 | Europa - 55 Cancri e | False | True | True | 2.0 | 14.0 |
10 rows × 29 columns
fig, axs = plt.subplots(figsize=(10, 8))
d = for_visual_df.groupby('Transported', as_index=False).size()
axs.pie(d['size'], labels= d['Transported'], autopct='%1.1f%%')
([<matplotlib.patches.Wedge at 0x1bc25bc1f70>, <matplotlib.patches.Wedge at 0x1bc25bcc820>], [Text(0.012522008688130899, 1.0999287246446539, 'False'), Text(-0.012522008688131034, -1.0999287246446539, 'True')], [Text(0.006830186557162308, 0.5999611225334475, '49.6%'), Text(-0.006830186557162382, -0.5999611225334475, '50.4%')])
Die Gruppen: Transported und Not Transported sind ungefähr gleich groß. 50.4% und 49.6%
for_visual_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 8693 entries, 0 to 8692 Data columns (total 29 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PassengerId 8693 non-null object 1 HomePlanet 8492 non-null object 2 CryoSleep 8476 non-null object 3 Cabin 8494 non-null object 4 Destination 8511 non-null object 5 Age 8514 non-null float64 6 VIP 8490 non-null object 7 RoomService 8512 non-null float64 8 FoodCourt 8510 non-null float64 9 ShoppingMall 8485 non-null float64 10 Spa 8510 non-null float64 11 VRDeck 8505 non-null float64 12 Name 8493 non-null object 13 Transported 8693 non-null bool 14 CabinDeck 8494 non-null object 15 CabinNum 8494 non-null float64 16 CabinSide 8494 non-null object 17 GroupId 8693 non-null float64 18 NumInGroup 8693 non-null object 19 FirstName 8493 non-null object 20 LastName 8493 non-null object 21 GroupSize 8693 non-null int64 22 TotalSpend 7785 non-null float64 23 Route 8314 non-null object 24 IsSingle 8693 non-null bool 25 NoSpend 7785 non-null object 26 IsChild 8514 non-null object 27 namesakes_num_in_group 8493 non-null float64 28 NameLength 8493 non-null float64 dtypes: bool(2), float64(11), int64(1), object(15) memory usage: 1.9+ MB
category_features = [
'HomePlanet', 'Destination', 'CryoSleep', 'VIP', 'CabinDeck', 'CabinSide', 'IsChild', 'NoSpend', 'IsSingle']
fig, axs = plt.subplots(9, 2, figsize=(20, 55))
i = 0
axe = axs.ravel()
for f in category_features: #i, f in enumerate(category_features):
d1 = for_visual_df.groupby(f, as_index=False, dropna=False).size()
d1= d1.fillna('Unknown')
axe[i].pie(d1['size'], labels=d1[f], autopct='%1.1f%%')
axe[i].set_title(f)
i+=1
d2 = for_visual_df.groupby([f, 'Transported'], as_index=False, dropna=False).size()
d2= d2.fillna('Unknown')
sns.barplot(x=f, y="size",
hue="Transported",
data=d2, ax=axe[i])
i+=1
Erkenntnisse:
--------- Plot 1 'HomePlanet' --------
--------- Plot 2 'Destination' --------
fig, axs = plt.subplots(figsize=(10, 5))
d = for_visual_df.groupby(['Route', 'Transported'], as_index=False, dropna=False).size()
d= d.fillna('Unknown')
sns.barplot(x='size', y="Route",
hue="Transported",
data=d)
<AxesSubplot:xlabel='size', ylabel='Route'>
sns.histplot(data=for_visual_df, x='RoomService', bins=40)
<AxesSubplot:xlabel='RoomService', ylabel='Count'>
num_features = ['RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck', 'TotalSpend' ]
for f in num_features:
for_visual_df[f+'_log'] = np.log10(for_visual_df[f] + 1)
for_visual_df.head(5)
| PassengerId | HomePlanet | CryoSleep | Cabin | Destination | Age | VIP | RoomService | FoodCourt | ShoppingMall | ... | NoSpend | IsChild | namesakes_num_in_group | NameLength | RoomService_log | FoodCourt_log | ShoppingMall_log | Spa_log | VRDeck_log | TotalSpend_log | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0001_01 | Europa | False | B/0/P | TRAPPIST-1e | 39.0 | False | 0.0 | 0.0 | 0.0 | ... | True | False | 0.0 | 15.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 1 | 0002_01 | Earth | False | F/0/S | TRAPPIST-1e | 24.0 | False | 109.0 | 9.0 | 25.0 | ... | False | False | 0.0 | 12.0 | 2.041393 | 1.000000 | 1.414973 | 2.740363 | 1.653213 | 2.867467 |
| 2 | 0003_01 | Europa | False | A/0/S | TRAPPIST-1e | 58.0 | True | 43.0 | 3576.0 | 0.0 | ... | False | False | 1.0 | 13.0 | 1.643453 | 3.553519 | 0.000000 | 3.827111 | 1.698970 | 4.016365 |
| 3 | 0003_02 | Europa | False | A/0/S | TRAPPIST-1e | 33.0 | False | 0.0 | 1283.0 | 371.0 | ... | False | False | 1.0 | 12.0 | 0.000000 | 3.108565 | 2.570543 | 3.522444 | 2.287802 | 3.714078 |
| 4 | 0004_01 | Earth | False | F/1/S | TRAPPIST-1e | 16.0 | False | 303.0 | 70.0 | 151.0 | ... | False | True | 0.0 | 17.0 | 2.482874 | 1.851258 | 2.181844 | 2.752816 | 0.477121 | 3.038223 |
5 rows × 35 columns
fig, axs = plt.subplots(figsize=(15, 7))
sns.histplot(data=for_visual_df, x='Age', bins=80, hue="Transported", kde=True)
<AxesSubplot:xlabel='Age', ylabel='Count'>
num_features = ['RoomService', 'RoomService_log',
'FoodCourt', 'FoodCourt_log',
'ShoppingMall','ShoppingMall_log',
'Spa', 'Spa_log',
'VRDeck', 'VRDeck_log']
fig, axs = plt.subplots(5, 2, figsize=(20, 30))
axe = axs.ravel()
for i, f in enumerate(num_features):
sns.histplot(data=for_visual_df, x=f, ax=axe[i], bins=40, hue="Transported", kde=True)
if i%2:
# axe[i].set(xscale="log")
axe[i].set_xlim(0,4.5)
axe[i].set_ylim(0,220)
fig, axs = plt.subplots(1, 2, figsize=(20, 7))
axe = axs.ravel()
sns.histplot(data=for_visual_df, x='TotalSpend', ax=axe[0], bins=40, hue="Transported", kde=True)
sns.histplot(data=for_visual_df, x='TotalSpend_log', ax=axe[1], bins=40, hue="Transported", kde=True)
<AxesSubplot:xlabel='TotalSpend_log', ylabel='Count'>
fig, axs = plt.subplots(figsize=(10, 5))
d = for_visual_df.groupby(['GroupSize', 'Transported'], as_index=False, dropna=False).size()
sns.barplot(x='GroupSize', y="size",
hue="Transported",
data=d)
<AxesSubplot:xlabel='GroupSize', ylabel='size'>
d=for_visual_df.groupby(['Destination', 'HomePlanet']).size().unstack().fillna(0)
fig, axs = plt.subplots(figsize=(12, 6))
sns.heatmap(d, annot=True, fmt='g', cmap='mako_r')
<AxesSubplot:xlabel='HomePlanet', ylabel='Destination'>
d=for_visual_df.groupby(['HomePlanet', 'CabinDeck']).size().unstack().fillna(0)
fig, axs = plt.subplots(figsize=(12, 6))
sns.heatmap(d, annot=True, fmt='g', cmap='mako_r')
<AxesSubplot:xlabel='CabinDeck', ylabel='HomePlanet'>
d=for_visual_df.groupby(['Destination', 'CabinDeck']).size().unstack().fillna(0)
fig, axs = plt.subplots(figsize=(12, 6))
sns.heatmap(d, annot=True, fmt='g', cmap='mako_r')
<AxesSubplot:xlabel='CabinDeck', ylabel='Destination'>
d=for_visual_df.groupby(['Route', 'CabinDeck']).size().unstack().fillna(0)
fig, axs = plt.subplots(figsize=(12, 6))
sns.heatmap(d, annot=True, fmt='g', cmap='mako_r')
<AxesSubplot:xlabel='CabinDeck', ylabel='Route'>
for_visual_df.loc[:, ['GroupId', 'Age', 'RoomService', 'FoodCourt', 'ShoppingMall',
'Spa', 'VRDeck', 'CabinNum', 'GroupSize', 'TotalSpend']].info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 8693 entries, 0 to 8692 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 GroupId 8693 non-null float64 1 Age 8514 non-null float64 2 RoomService 8512 non-null float64 3 FoodCourt 8510 non-null float64 4 ShoppingMall 8485 non-null float64 5 Spa 8510 non-null float64 6 VRDeck 8505 non-null float64 7 CabinNum 8494 non-null float64 8 GroupSize 8693 non-null int64 9 TotalSpend 7785 non-null float64 dtypes: float64(9), int64(1) memory usage: 747.1 KB
sns.pairplot(for_visual_df.loc[:, ['GroupId', 'Age', 'RoomService_log', 'FoodCourt_log', 'ShoppingMall_log',
'Spa_log', 'VRDeck_log', 'CabinNum', 'GroupSize', 'TotalSpend']],
corner=True)
<seaborn.axisgrid.PairGrid at 0x1bc26f730a0>
fig, axs = plt.subplots(figsize=(12, 6))
sns.scatterplot(data=for_visual_df, x='CabinNum', y='GroupId', hue='CabinDeck', ax=axs)
<AxesSubplot:xlabel='CabinNum', ylabel='GroupId'>
fig, axs = plt.subplots(figsize=(12, 6))
sns.scatterplot(data=for_visual_df,x="Age", y="TotalSpend", hue='CabinDeck',ax=axs)
<AxesSubplot:xlabel='Age', ylabel='TotalSpend'>
for_visual_df.loc[for_visual_df['Age'] < 13, ['CabinDeck']].value_counts()
CabinDeck G 548 F 173 E 23 C 16 B 15 A 12 D 2 dtype: int64
fig, axs = plt.subplots(3, 2, figsize=(20, 20))
axs = axs.ravel()
d=for_visual_df.groupby(['CryoSleep', 'HomePlanet']).size().unstack().fillna(0)
sns.heatmap(d, annot=True, fmt='g', cmap='mako_r', ax=axs[0])
d=for_visual_df.groupby(['CryoSleep', 'Destination']).size().unstack().fillna(0)
sns.heatmap(d, annot=True, fmt='g', cmap='mako_r', ax=axs[1])
d=for_visual_df.groupby(['CryoSleep', 'CabinDeck']).size().unstack().fillna(0)
sns.heatmap(d, annot=True, fmt='g', cmap='mako_r', ax=axs[2])
d=for_visual_df.groupby(['CryoSleep', 'CabinSide']).size().unstack().fillna(0)
sns.heatmap(d, annot=True, fmt='g', cmap='mako_r', ax=axs[3])
d=for_visual_df.groupby(['CryoSleep', 'IsChild']).size().unstack().fillna(0)
sns.heatmap(d, annot=True, fmt='g', cmap='mako_r', ax=axs[4])
d=for_visual_df.groupby(['CryoSleep', 'NoSpend']).size().unstack().fillna(0)
sns.heatmap(d, annot=True, fmt='g', cmap='mako_r', ax=axs[5])
# d=for_visual_df.loc[for_visual_df['IsChild']==False, :].groupby(['CryoSleep', 'NoSpend']).size().unstack().fillna(0)
# sns.heatmap(d, annot=True, fmt='g', cmap='mako_r', ax=axs[6])
<AxesSubplot:xlabel='NoSpend', ylabel='CryoSleep'>
for_visual_df.groupby(['CryoSleep']).agg(
max_spend=('TotalSpend', 'max'),
min_spend=('TotalSpend', 'min'),
count=('PassengerId', 'count'),
min_age=('Age', 'min'),
max_age=('Age', 'max')
)
| max_spend | min_spend | count | min_age | max_age | |
|---|---|---|---|---|---|
| CryoSleep | |||||
| False | 35987.0 | 0.0 | 5439 | 0.0 | 79.0 |
| True | 0.0 | 0.0 | 3037 | 0.0 | 78.0 |
d= for_visual_df.groupby(['CabinNum', 'CabinDeck'], as_index=False).agg(TotalSpendProCabin = ('TotalSpend', 'sum'))
fig, axs = plt.subplots(figsize=(12, 6))
sns.scatterplot(data=d, x='CabinNum', y='TotalSpendProCabin', ax=axs)
<AxesSubplot:xlabel='CabinNum', ylabel='TotalSpendProCabin'>
fig, axs = plt.subplots(3, 1, figsize=(15, 13))
d=for_visual_df.groupby(['VIP', 'HomePlanet']).size().unstack().fillna(0)
sns.heatmap(d, annot=True, fmt='g', cmap='mako_r', ax=axs[0])
d=for_visual_df.groupby(['VIP', 'Destination']).size().unstack().fillna(0)
sns.heatmap(d, annot=True, fmt='g', cmap='mako_r', ax=axs[1])
d=for_visual_df.groupby(['VIP', 'CabinDeck']).size().unstack().fillna(0)
sns.heatmap(d, annot=True, fmt='g', cmap='mako_r', ax=axs[2])
<AxesSubplot:xlabel='CabinDeck', ylabel='VIP'>
-- Es gibt keine VIP-Passagiere auf Deck G, alle Passagiere auf Deck G sind vom Planeten Erde
for_visual_df.loc[(for_visual_df['VIP'].notnull()) & (for_visual_df['VIP']) & (for_visual_df['HomePlanet'].isnull()), :]
| PassengerId | HomePlanet | CryoSleep | Cabin | Destination | Age | VIP | RoomService | FoodCourt | ShoppingMall | ... | NoSpend | IsChild | namesakes_num_in_group | NameLength | RoomService_log | FoodCourt_log | ShoppingMall_log | Spa_log | VRDeck_log | TotalSpend_log | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 291 | 0321_01 | NaN | False | F/61/S | TRAPPIST-1e | 59.0 | True | 1018.0 | 0.0 | 209.0 | ... | False | False | 0.0 | 12.0 | 3.008174 | 0.000000 | 2.322219 | 0.000000 | 0.000000 | 3.089198 |
| 365 | 0402_01 | NaN | True | D/15/S | 55 Cancri e | 32.0 | True | 0.0 | 0.0 | 0.0 | ... | True | False | 0.0 | 12.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 405 | 0444_02 | NaN | False | F/99/P | TRAPPIST-1e | 26.0 | True | 1869.0 | 0.0 | 136.0 | ... | False | False | 1.0 | 11.0 | 3.271842 | 0.000000 | 2.136721 | 0.000000 | 0.000000 | 3.302331 |
| 7042 | 7500_01 | NaN | False | F/1432/S | TRAPPIST-1e | 54.0 | True | 1770.0 | 7.0 | 228.0 | ... | False | False | 0.0 | 11.0 | 3.248219 | 0.903090 | 2.359835 | 0.000000 | 0.000000 | 3.302331 |
| 7786 | 8314_02 | NaN | False | D/245/S | 55 Cancri e | 29.0 | True | 0.0 | 2949.0 | 0.0 | ... | False | False | 1.0 | 17.0 | 0.000000 | 3.469822 | 0.000000 | 0.477121 | 2.816241 | 3.557026 |
5 rows × 35 columns
fig, axs = plt.subplots(figsize=(10, 4))
d=for_visual_df.groupby(['CryoSleep', 'VIP']).size().unstack().fillna(0)
sns.heatmap(d, annot=True, fmt='g', cmap='mako_r', ax=axs)
<AxesSubplot:xlabel='VIP', ylabel='CryoSleep'>
# d = for_visual_df.groupby('GroupId', as_index=False).agg(
# sum_cryosleep=('CryoSleep', 'sum'),
# count = ('PassengerId','size'))
# d.sort_values('sum_cryosleep', ascending=False).head(5)
for_visual_df.columns
Index(['PassengerId', 'HomePlanet', 'CryoSleep', 'Cabin', 'Destination', 'Age',
'VIP', 'RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck',
'Name', 'Transported', 'CabinDeck', 'CabinNum', 'CabinSide', 'GroupId',
'NumInGroup', 'FirstName', 'LastName', 'GroupSize', 'TotalSpend',
'Route', 'IsSingle', 'NoSpend', 'IsChild', 'namesakes_num_in_group',
'NameLength', 'RoomService_log', 'FoodCourt_log', 'ShoppingMall_log',
'Spa_log', 'VRDeck_log', 'TotalSpend_log'],
dtype='object')
name_features = ['namesakes_num_in_group', 'NameLength', ]
fig, axs = plt.subplots(2, 1, figsize=(10, 10))
i = 0
axe = axs.ravel()
for i, f in enumerate(name_features):
d2 = for_visual_df.groupby([f, 'Transported'], as_index=False, dropna=False).size()
d2= d2.fillna('Unknown')
sns.barplot(x=f, y="size",
hue="Transported",
data=d2, ax=axe[i])
expenses_cols = ["RoomService", "FoodCourt", "ShoppingMall", "Spa", 'VRDeck']
data['Transported'] = data["Transported"].astype(int)
# trenne in trainings und test daten
data_train, data_test = train_test_split(data, train_size=0.8, random_state=7)
# data_train = data
# trenne in X_train/y_train und X_test/y_test
X_train = data_train.drop(columns="Transported")
y_train = data_train.loc[:, "Transported"]
X_test = data_test.drop(columns="Transported")
y_test = data_test.loc[:, "Transported"]
# Data Transformationen:
# - Name - split: 'FirstName', 'LastName'
#- PassengerId - split: 'GroupId', 'NumInGroup' -- to float
def new_features_create(df):
df[['FirstName', 'LastName']] = df['Name'].str.split(' ', expand=True)
df[['GroupId', 'NumInGroup']] = df['PassengerId'].str.split('_', expand=True)
df[['CabinDeck', 'CabinNum', 'CabinSide']] = df['Cabin'].str.split('/', expand=True)
df['IsChild'] = df.loc[:, 'Age'].apply(lambda x: np.nan if math.isnan(x) else x < 18)
df['TotalSpend'] = df.loc[:, expenses_cols].sum(axis=1, min_count=5)
df['GroupId'] = df.loc[:,'GroupId'].astype(float)
df['NumInGroup'] = df.loc[:,'NumInGroup'].astype(float)
df['CabinNum'] = df.loc[:,'CabinNum'].astype(float)
GroupSize_df = df.groupby('GroupId', as_index=False).agg(GroupSize = ('PassengerId', 'count'))
df = df.merge(GroupSize_df, how='left', left_on='GroupId', right_on='GroupId', copy=False)
df['IsSingle'] = df.loc[:, 'GroupSize'].apply(lambda x: x==1 )
return df
X_train = new_features_create(X_train)
X_train.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 6954 entries, 0 to 6953 Data columns (total 24 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PassengerId 6954 non-null object 1 HomePlanet 6788 non-null object 2 CryoSleep 6789 non-null object 3 Cabin 6797 non-null object 4 Destination 6801 non-null object 5 Age 6800 non-null float64 6 VIP 6788 non-null object 7 RoomService 6808 non-null float64 8 FoodCourt 6806 non-null float64 9 ShoppingMall 6796 non-null float64 10 Spa 6802 non-null float64 11 VRDeck 6803 non-null float64 12 Name 6789 non-null object 13 FirstName 6789 non-null object 14 LastName 6789 non-null object 15 GroupId 6954 non-null float64 16 NumInGroup 6954 non-null float64 17 CabinDeck 6797 non-null object 18 CabinNum 6797 non-null float64 19 CabinSide 6797 non-null object 20 IsChild 6800 non-null object 21 TotalSpend 6223 non-null float64 22 GroupSize 6954 non-null int64 23 IsSingle 6954 non-null bool dtypes: bool(1), float64(10), int64(1), object(12) memory usage: 1.3+ MB
X_train.head()
| PassengerId | HomePlanet | CryoSleep | Cabin | Destination | Age | VIP | RoomService | FoodCourt | ShoppingMall | ... | LastName | GroupId | NumInGroup | CabinDeck | CabinNum | CabinSide | IsChild | TotalSpend | GroupSize | IsSingle | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 6309_02 | Earth | NaN | G/1023/S | TRAPPIST-1e | 7.0 | False | 0.0 | 0.0 | 0.0 | ... | Rodricker | 6309.0 | 2.0 | G | 1023.0 | S | True | 0.0 | 4 | False |
| 1 | 2908_02 | NaN | False | F/553/S | TRAPPIST-1e | 13.0 | False | 649.0 | 2.0 | 0.0 | ... | Berreranks | 2908.0 | 2.0 | F | 553.0 | S | True | 918.0 | 1 | True |
| 2 | 0548_01 | Earth | False | E/36/S | TRAPPIST-1e | 24.0 | False | 0.0 | 86.0 | 704.0 | ... | Webstes | 548.0 | 1.0 | E | 36.0 | S | False | 791.0 | 1 | True |
| 3 | 8757_01 | Earth | False | G/1409/S | TRAPPIST-1e | 28.0 | False | 1.0 | 1484.0 | 210.0 | ... | Danielps | 8757.0 | 1.0 | G | 1409.0 | S | False | 1700.0 | 1 | True |
| 4 | 1644_01 | Earth | False | F/327/P | 55 Cancri e | 14.0 | False | 0.0 | 2.0 | 0.0 | ... | Waltonnedy | 1644.0 | 1.0 | F | 327.0 | P | True | 796.0 | 1 | True |
5 rows × 24 columns
for_visual_df.loc[for_visual_df['HomePlanet'].notnull(), :].groupby(['GroupId'], as_index=False).agg(
HomePlanet_num = ('HomePlanet', lambda x: len(x.unique()))).sort_values('HomePlanet_num', ascending = False)
| GroupId | HomePlanet_num | |
|---|---|---|
| 0 | 1.0 | 1 |
| 4068 | 6128.0 | 1 |
| 4077 | 6141.0 | 1 |
| 4076 | 6139.0 | 1 |
| 4075 | 6138.0 | 1 |
| ... | ... | ... |
| 2033 | 3083.0 | 1 |
| 2032 | 3082.0 | 1 |
| 2031 | 3081.0 | 1 |
| 2030 | 3080.0 | 1 |
| 6106 | 9280.0 | 1 |
6107 rows × 2 columns
--> Menschen aus derselben Gruppe fliegen immer vom selben 'HomePlanet'
for_visual_df.loc[
for_visual_df['HomePlanet'].notnull(), :].groupby(['LastName'], as_index=False).agg(
num_planets=('HomePlanet', lambda x: len(x.unique()))).sort_values('num_planets', ascending = False)
| LastName | num_planets | |
|---|---|---|
| 0 | Acobson | 1 |
| 1475 | Parrett | 1 |
| 1469 | Panie | 1 |
| 1470 | Panspic | 1 |
| 1471 | Parbage | 1 |
| ... | ... | ... |
| 731 | Flynner | 1 |
| 730 | Flyncharlan | 1 |
| 729 | Floydendley | 1 |
| 728 | Flowensley | 1 |
| 2208 | Youngrayes | 1 |
2209 rows × 2 columns
--> Passagiere mit demselben 'LatName' fliegen von demselben Planeten.
def HomePlanet_update(df, source_df):
print('------ HomePlanet_update ------------')
# Menschen aus derselben Gruppe fliegen immer vom selben 'HomePlanet'
planet_for_group_df = source_df.loc[source_df['HomePlanet'].notnull(), :].groupby(['GroupId'], as_index=False).agg(
num_planets=('HomePlanet', lambda x: len(x.unique())),
not_null_planet = ('HomePlanet', lambda x: x.unique()[0]))
start_nulls_num = df.loc[df['HomePlanet'].isnull(), :].shape[0]
print('HomePlanet NULLs:', start_nulls_num)
df['HomePlanet'] = df.apply(
lambda x: (planet_for_group_df.loc[planet_for_group_df['GroupId'] == x.GroupId, 'not_null_planet'].iloc[0] if planet_for_group_df.loc[planet_for_group_df['GroupId'] == x.GroupId, 'not_null_planet'].shape[0] > 0 else np.nan)
if x.HomePlanet is np.nan else x.HomePlanet , axis=1)
print('1. (after replacement through the GroupId) HomePlanet NULLs:', df.loc[df['HomePlanet'].isnull(), :].shape[0])
#----------------------------------------------------------------------
# wenn Deck A, B, C; T, 'Europa'. Wenn G, dann 'Earth'
df.loc[(df['HomePlanet'].isnull()) & (df['CabinDeck'].isin(['A', 'B', 'C', 'T'])), 'HomePlanet']='Europa'
df.loc[(df['HomePlanet'].isnull()) & (df['CabinDeck'].isin(['G'])), 'HomePlanet']='Earth'
print('2. (after replacement through the Deck) HomePlanet NULLs:', df.loc[df['HomePlanet'].isnull(), :].shape[0])
# ---------------------------------------------------------------------
# Passagiere mit demselben 'LatName' fliegen von demselben Planeten
planet_for_lastname_df = source_df.loc[
source_df['HomePlanet'].notnull(), :].groupby(['LastName'], as_index=False).agg(
num_planets=('HomePlanet', lambda x: len(x.unique())),
not_null_planet = ('HomePlanet', lambda x: x.unique()[0]))
df['HomePlanet'] = df.apply(
lambda x: (planet_for_lastname_df.loc[planet_for_lastname_df['LastName'] == x.LastName, 'not_null_planet'].iloc[0] if planet_for_lastname_df.loc[planet_for_lastname_df['LastName'] == x.LastName, 'not_null_planet'].shape[0] > 0 else np.nan)
if x.HomePlanet is np.nan else x.HomePlanet , axis=1)
print('3. (after replacement through the LastName) HomePlanet NULLs:', df.loc[df['HomePlanet'].isnull(), :].shape[0])
#----------------------------------------------------------------------
df.loc[(df['HomePlanet'].isnull()) & (df['Destination'].isin(['TRAPPIST-1e', 'PSO J318.5-22'])) & (df['CabinDeck'] == 'D'), 'HomePlanet']='Mars'
df.loc[(df['HomePlanet'].isnull()) & ~(df['CabinDeck'] == 'D'), 'HomePlanet']='Earth'
df.loc[(df['HomePlanet'].isnull()) & (df['Destination'] == '55 Cancri e') & (df['CabinDeck'] == 'D'), 'HomePlanet']='Europa'
print('4. (after replacement through the Destination):', df.loc[df['HomePlanet'].isnull(), :].shape[0])
#------------------------------------------------------------------------
# für alle verbleibenden leeren Werte durch 'Earth' ersetzen
df.fillna(value= {'HomePlanet': 'Earth'}, inplace=True)
print('5. (after replacement by the most common value) HomePlanet NULLs:', df.loc[df['HomePlanet'].isnull(), :].shape[0])
return df
X_train = HomePlanet_update(X_train, for_visual_df)
------ HomePlanet_update ------------ HomePlanet NULLs: 166 1. (after replacement through the GroupId) HomePlanet NULLs: 94 2. (after replacement through the Deck) HomePlanet NULLs: 54 3. (after replacement through the LastName) HomePlanet NULLs: 7 4. (after replacement through the Destination): 0 5. (after replacement by the most common value) HomePlanet NULLs: 0
for_visual_df.loc[
for_visual_df['Destination'].notnull(), :].groupby(['LastName'], as_index=False).agg(
num_planets=('Destination', lambda x: len(x.unique())),
not_null_planet = ('Destination', lambda x: x.unique()[0])).sort_values('num_planets', ascending=False)
| LastName | num_planets | not_null_planet | |
|---|---|---|---|
| 1858 | Slable | 3 | PSO J318.5-22 |
| 2099 | Villenson | 3 | PSO J318.5-22 |
| 610 | Dotsondez | 3 | 55 Cancri e |
| 2102 | Vinozarks | 3 | TRAPPIST-1e |
| 2103 | Vinston | 3 | PSO J318.5-22 |
| ... | ... | ... | ... |
| 1018 | Howence | 1 | PSO J318.5-22 |
| 1017 | Howayery | 1 | TRAPPIST-1e |
| 1015 | Hotty | 1 | TRAPPIST-1e |
| 1012 | Horthy | 1 | TRAPPIST-1e |
| 2208 | Youngrayes | 1 | TRAPPIST-1e |
2209 rows × 3 columns
for_visual_df.loc[for_visual_df['Destination'].notnull(), :].groupby(['GroupId'], as_index=False).agg(
num_planets=('Destination', lambda x: len(x.unique())),
not_null_planet = ('Destination', lambda x: x.unique()[0])).sort_values('num_planets', ascending=False)
| GroupId | num_planets | not_null_planet | |
|---|---|---|---|
| 4426 | 6672.0 | 3 | TRAPPIST-1e |
| 885 | 1350.0 | 3 | 55 Cancri e |
| 5899 | 8956.0 | 3 | TRAPPIST-1e |
| 1916 | 2892.0 | 3 | PSO J318.5-22 |
| 5860 | 8886.0 | 3 | 55 Cancri e |
| ... | ... | ... | ... |
| 2477 | 3757.0 | 1 | TRAPPIST-1e |
| 2476 | 3756.0 | 1 | TRAPPIST-1e |
| 420 | 631.0 | 1 | TRAPPIST-1e |
| 2474 | 3754.0 | 1 | TRAPPIST-1e |
| 0 | 1.0 | 1 | TRAPPIST-1e |
6114 rows × 3 columns
--> Eins-zu-eins-Abhängigkeit zwischen Gruppennummer oder Nachname und Zielort nicht gefunden, leere Werte durch den häufigsten Zielort ersetzen
def Destination_update(df):
print('------ Destination_update ------------')
start_nulls_num = df.loc[df['Destination'].isnull(), :].shape[0]
print('Destination NULLs:', start_nulls_num)
df.fillna(value= {'Destination': 'TRAPPIST-1e'}, inplace=True)
print('(after replacement by the most common value) Destination NULLs:', df.loc[df['Destination'].isnull(), :].shape[0])
return df
X_train= Destination_update(X_train)
------ Destination_update ------------ Destination NULLs: 153 (after replacement by the most common value) Destination NULLs: 0
num_unique_last_names_in_group = for_visual_df[for_visual_df['GroupSize'] > 1].groupby(['GroupId'], as_index=False).agg(
lastname_num_in_group = ('LastName', lambda x: len(x.value_counts()))).groupby('lastname_num_in_group').size().sort_values()
num_unique_last_names_in_group
lastname_num_in_group 4 1 3 23 2 225 1 1163 dtype: int64
# Countplot of unique values
plt.figure(figsize=(10,4))
sns.barplot(x=num_unique_last_names_in_group.index, y=num_unique_last_names_in_group.values)
plt.title('Number of unique surnames by group')
Text(0.5, 1.0, 'Number of unique surnames by group')
for_visual_df.loc[for_visual_df['LastName'].notnull(), ['GroupId', 'LastName']].groupby(['GroupId'], as_index=False).agg(
not_null_last_name = ('LastName', lambda x: x.value_counts().index[0]))
| GroupId | not_null_last_name | |
|---|---|---|
| 0 | 1.0 | Ofracculy |
| 1 | 2.0 | Vines |
| 2 | 3.0 | Susent |
| 3 | 4.0 | Santantines |
| 4 | 5.0 | Hinetthews |
| ... | ... | ... |
| 6108 | 9275.0 | Conable |
| 6109 | 9276.0 | Noxnuther |
| 6110 | 9278.0 | Mondalley |
| 6111 | 9279.0 | Connon |
| 6112 | 9280.0 | Hontichre |
6113 rows × 2 columns
# suchen nach LastName in der gleichen Gruppe und aktualisiren null Werte
def LastName_update(df):
print('------ LastName_update ------------')
print('LastName NULLs:', df.loc[df['LastName'].isnull()].shape[0])
lastnames_for_nulls_df = df.loc[df['LastName'].notnull(), ['GroupId', 'LastName']].groupby(['GroupId'], as_index=False).agg(
not_null_last_name = ('LastName', lambda x: x.value_counts().index[0]))
print(lastnames_for_nulls_df.shape)
df['LastName'] = df.apply(
lambda x: (
lastnames_for_nulls_df.loc[lastnames_for_nulls_df['GroupId'] == x.GroupId, 'not_null_last_name'].iloc[0]
if lastnames_for_nulls_df.loc[lastnames_for_nulls_df['GroupId'] == x.GroupId, 'not_null_last_name'].shape[0] > 0 else np.nan)
if x.LastName is np.nan else x.LastName , axis=1)
print('(after update) LastName NULLs:', df.loc[df['LastName'].isnull()].shape[0])
#--------------------------------------------------------
# die abhängigen Felder aktualisieren
namesakes_in_group_df = df.groupby(['LastName', 'GroupId'], as_index=False).agg(
namesakes_num_in_group = ('PassengerId', 'count'))
df = df.merge(
namesakes_in_group_df, how='left',
left_on=['LastName', 'GroupId'],
right_on=['LastName', 'GroupId'])
df['namesakes_num_in_group'] = df['namesakes_num_in_group'] - 1
print('namesakes_num_in_group NULLs:', df.loc[df['namesakes_num_in_group'].isnull()].shape[0])
df.fillna(value= {'namesakes_num_in_group': 0}, inplace=True)
print('namesakes_num_in_group NULLs:', df.loc[df['namesakes_num_in_group'].isnull()].shape[0])
#----------------------------------------------------------------
df.loc[df['Name'].isnull(), 'Name'] = df.loc[df['Name'].isnull(), 'LastName']
df['NameLength'] = df.loc[:, 'Name'].str.len()
df.fillna(value= {'NameLength': 0}, inplace=True)
return df
X_train = LastName_update(X_train)
------ LastName_update ------------ LastName NULLs: 165 (5127, 2) (after update) LastName NULLs: 95 namesakes_num_in_group NULLs: 95 namesakes_num_in_group NULLs: 0
tmp = X_train.loc[X_train['CabinDeck'].notnull(), ['GroupId', 'CabinDeck']].groupby(['GroupId'], as_index=False).agg(
not_null_cabin_deck = ('CabinDeck', lambda x: len(x.value_counts()))).sort_values('not_null_cabin_deck')
tmp.groupby('not_null_cabin_deck').size()
not_null_cabin_deck 1 4826 2 300 3 13 dtype: int64
def CabinDeck_update(df):
print('------ CabinDeck_update ------------')
print('CabinDeck NULLs:', df.loc[df['CabinDeck'].isnull()].shape[0])
cabindecks_for_nulls_df = df.loc[df['CabinDeck'].notnull(), ['GroupId', 'CabinDeck']].groupby(['GroupId'], as_index=False).agg(
not_null_cabin_deck = ('CabinDeck', lambda x: x.value_counts().index[0])).sort_values('not_null_cabin_deck')
df['CabinDeck'] = df.apply(
lambda x: (
cabindecks_for_nulls_df.loc[cabindecks_for_nulls_df['GroupId'] == x.GroupId, 'not_null_cabin_deck'].iloc[0]
if cabindecks_for_nulls_df.loc[cabindecks_for_nulls_df['GroupId'] == x.GroupId, 'not_null_cabin_deck'].shape[0] > 0 else np.nan)
if x.CabinDeck is np.nan else x.CabinDeck , axis=1)
print('(after update throw GroupId) CabinDeck NULLs:', df.loc[df['CabinDeck'].isnull()].shape[0])
#---------------------------------------------------------------------------------
# Mars -> F
# Europa -> B
# Earth -> G
df.loc[(df['CabinDeck'].isnull()) & (df['HomePlanet'] == 'Mars'), 'CabinDeck'] = 'F'
df.loc[(df['CabinDeck'].isnull()) & (df['HomePlanet'] == 'Earth'), 'CabinDeck'] = 'G'
df.loc[(df['CabinDeck'].isnull()) & (df['HomePlanet'] == 'Europa'), 'CabinDeck'] = 'B'
print('(after update throw HomePlanet) CabinDeck NULLs:', df.loc[df['CabinDeck'].isnull()].shape[0])
return df
X_train = CabinDeck_update(X_train)
------ CabinDeck_update ------------ CabinDeck NULLs: 157 (after update throw GroupId) CabinDeck NULLs: 83 (after update throw HomePlanet) CabinDeck NULLs: 0
decks = sorted(list(for_visual_df.CabinDeck.value_counts().index))
decks
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'T']
def CabinNum_update(df, source_data):
print('------ CabinNum_update ------------')
null_ind = df.loc[df['CabinNum'].isnull()].index
print('CabinNum NULLs:', df.loc[df['CabinNum'].isnull()].shape[0])
decks = sorted(list(source_data.CabinDeck.value_counts().index))
print(decks)
df_y_pred= []
for deck in decks:
df_x_pred = df.loc[(df['CabinNum'].isnull()) & (df['CabinDeck'] == deck), ['GroupId']]
if df_x_pred.shape[0] > 0:
df_x_train = df.loc[(df['CabinNum'].notnull()) & (df['CabinDeck'] == deck), ['GroupId']]
df_y_train = df.loc[(df['CabinNum'].notnull()) & (df['CabinDeck'] == deck), ['CabinNum']]
lr_mod = LinearRegression()
lr_mod.fit(df_x_train, df_y_train)
df_y_pred = lr_mod.predict(df_x_pred)
print(deck, df_y_pred.shape)
df.loc[(df['CabinNum'].isnull()) & (df['CabinDeck'] == deck), ['CabinNum']] = df_y_pred
print('CabinNum NULLs:', df.loc[df['CabinNum'].isnull()].shape[0])
return (null_ind, df)
null_cabin_num_ind, X_train = CabinNum_update(X_train, for_visual_df)
------ CabinNum_update ------------ CabinNum NULLs: 157 ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'T'] A (6, 1) B (29, 1) C (14, 1) D (5, 1) E (6, 1) F (31, 1) G (66, 1) CabinNum NULLs: 0
fig, axs = plt.subplots(figsize=(12, 6))
sns.scatterplot(data=X_train.loc[null_cabin_num_ind, :], x='CabinNum', y='GroupId', hue='CabinDeck', ax=axs)
<AxesSubplot:xlabel='CabinNum', ylabel='GroupId'>
tmp = X_train.loc[X_train['CabinSide'].notnull(), ['GroupId', 'CabinSide']].groupby(['GroupId'], as_index=False).agg(
not_null_cabin_side = ('CabinSide', lambda x: len(x.value_counts()))).sort_values('not_null_cabin_side')
tmp.groupby('not_null_cabin_side').size()
not_null_cabin_side 1 5139 dtype: int64
def CabinSide_update(df, source_data):
print('------ CabinSide_update ------------')
print('CabinSide NULLs:', df.loc[df['CabinSide'].isnull()].shape[0])
cabin_side_for_group_id_df = source_data.loc[
source_data['CabinSide'].notnull(), :].groupby(['GroupId'], as_index=False).agg(
not_null_cabin_side = ('CabinSide', lambda x: x.unique()[0]))
df['CabinSide'] = df.apply(
lambda x:
(cabin_side_for_group_id_df.loc[cabin_side_for_group_id_df['GroupId'] == x.GroupId, 'not_null_cabin_side'].iloc[0] if cabin_side_for_group_id_df.loc[cabin_side_for_group_id_df['GroupId'] == x.GroupId, 'not_null_cabin_side'].shape[0] > 0 else np.nan)
if x.CabinSide is np.nan else x.CabinSide , axis=1)
print('CabinSide NULLs:', df.loc[df['CabinSide'].isnull()].shape[0])
#-----------------------------------------------------
df.fillna(value= {'CabinSide': 'O'}, inplace=True)
print('CabinSide NULLs:', df.loc[df['CabinSide'].isnull()].shape[0])
return df
X_train = CabinSide_update(X_train, for_visual_df)
------ CabinSide_update ------------ CabinSide NULLs: 157 CabinSide NULLs: 81 CabinSide NULLs: 0
X_train.CabinSide.unique()
array(['S', 'P', 'O'], dtype=object)
def VIP_update(df):
print('------ VIP_update ------------')
print('VIP NULLs:', df.loc[df['VIP'].isnull()].shape[0])
df.fillna(value= {'VIP': False}, inplace=True)
print('VIP NULLs:', df.loc[df['VIP'].isnull()].shape[0])
return df
X_train = VIP_update(X_train)
------ VIP_update ------------ VIP NULLs: 166 VIP NULLs: 0
for_visual_df.loc[
(for_visual_df['CryoSleep'].notnull())
& (for_visual_df['CryoSleep'])
& (for_visual_df['TotalSpend'].notnull()), 'TotalSpend'].sum()
0.0
def CryoSleep_update(df):
print('------ CryoSleep_update ------------')
print(df.loc[df['CryoSleep'].isnull(), :].shape[0])
df.loc[(df['CryoSleep'].isnull()) & (df['TotalSpend'] == 0), 'CryoSleep'] = True
print(df.loc[df['CryoSleep'].isnull(), :].shape[0])
#------------------------------------------------------------
df.fillna(value= {'CryoSleep': False}, inplace=True)
print(df.loc[df['CryoSleep'].isnull(), :].shape[0])
return df
X_train = CryoSleep_update(X_train)
------ CryoSleep_update ------------ 165 101 0
def get_age_medians(df):
return df.loc[:, ['IsSingle', 'HomePlanet', 'VIP', 'Age']].groupby(
['IsSingle', 'HomePlanet', 'VIP'],
as_index=False).agg('median')
def Age_update(df, median_values):
print('------ Age_update ------------')
print('Nulls in Expenses:', df.loc[:, 'Age'].isna().sum())
for index, row in median_values.iterrows():
cond = (df['IsSingle'] == row['IsSingle']) & (df['VIP'] == row['VIP']) & (df['HomePlanet'] == row['HomePlanet'])
df.loc[cond, :] = df.loc[cond, :].fillna(value = {'Age': row['Age']})
print('Nulls in Expenses:', df.loc[:, 'Age'].isna().sum())
#-----------------------------------------------------------------------------
df.fillna(value= {'Age': 0}, inplace=True)
print('Nulls in Expenses:', df.loc[:, 'Age'].isna().sum())
return df
age_medians_train = get_age_medians(X_train)
X_train = Age_update(X_train, age_medians_train)
------ Age_update ------------ Nulls in Expenses: 154 Nulls in Expenses: 0 Nulls in Expenses: 0
group_cols = ['IsSingle', 'VIP', 'HomePlanet']
cols= expenses_cols + group_cols
not_child_not_sleep_df = X_train.loc[(X_train['Age'] >= 13) & (X_train['CryoSleep'] == False), :]
print(not_child_not_sleep_df.shape)
no_spend_df = not_child_not_sleep_df.loc[
(not_child_not_sleep_df['ShoppingMall']==0) &
(not_child_not_sleep_df['FoodCourt']==0) &
(not_child_not_sleep_df['RoomService']==0) &
(not_child_not_sleep_df['Spa']==0) &
(not_child_not_sleep_df['VRDeck']==0), :
]
print(no_spend_df.shape)
expenses_means = not_child_not_sleep_df.loc[:, cols].groupby(group_cols, as_index=False).agg('mean')
expenses_means
(4122, 26) (80, 26)
| IsSingle | VIP | HomePlanet | RoomService | FoodCourt | ShoppingMall | Spa | VRDeck | |
|---|---|---|---|---|---|---|---|---|
| 0 | False | False | Earth | 232.229602 | 219.576923 | 206.981025 | 245.934489 | 215.865275 |
| 1 | False | False | Europa | 302.703704 | 2587.694323 | 289.389868 | 1612.439560 | 1593.158242 |
| 2 | False | False | Mars | 1016.549153 | 106.812081 | 510.896907 | 196.180272 | 90.174497 |
| 3 | False | True | Europa | 438.547170 | 2905.092593 | 188.452830 | 1220.000000 | 2686.283019 |
| 4 | False | True | Mars | 874.750000 | 146.750000 | 657.210526 | 176.476190 | 44.000000 |
| 5 | True | False | Earth | 217.863158 | 217.633958 | 211.233844 | 220.979006 | 214.701048 |
| 6 | True | False | Europa | 203.042017 | 2618.831933 | 288.394958 | 1474.092697 | 1547.339833 |
| 7 | True | False | Mars | 990.223214 | 101.777778 | 583.811947 | 167.164080 | 91.442953 |
| 8 | True | True | Europa | 359.055556 | 2992.617647 | 223.971429 | 1439.722222 | 1835.472222 |
| 9 | True | True | Mars | 860.062500 | 108.516129 | 358.878788 | 231.303030 | 51.848485 |
expenses_medians = not_child_not_sleep_df.loc[:, cols].groupby(group_cols, as_index=False).agg('median')
expenses_medians
| IsSingle | VIP | HomePlanet | RoomService | FoodCourt | ShoppingMall | Spa | VRDeck | |
|---|---|---|---|---|---|---|---|---|
| 0 | False | False | Earth | 5.0 | 4.0 | 4.0 | 2.0 | 7.0 |
| 1 | False | False | Europa | 0.0 | 1421.0 | 0.0 | 445.0 | 473.0 |
| 2 | False | False | Mars | 853.0 | 0.0 | 170.0 | 0.0 | 0.0 |
| 3 | False | True | Europa | 0.0 | 1525.0 | 0.0 | 417.0 | 1547.0 |
| 4 | False | True | Mars | 788.0 | 0.0 | 354.0 | 4.0 | 0.0 |
| 5 | True | False | Earth | 3.0 | 4.0 | 4.0 | 6.0 | 4.0 |
| 6 | True | False | Europa | 0.0 | 1389.0 | 0.0 | 272.5 | 438.0 |
| 7 | True | False | Mars | 726.5 | 0.0 | 268.0 | 0.0 | 0.0 |
| 8 | True | True | Europa | 0.0 | 2343.0 | 0.0 | 269.0 | 680.0 |
| 9 | True | True | Mars | 687.5 | 0.0 | 177.0 | 0.0 | 0.0 |
for index, row in expenses_medians.iterrows():
print(row[expenses_cols].to_dict())
print('---------------------------')
{'RoomService': 5.0, 'FoodCourt': 4.0, 'ShoppingMall': 4.0, 'Spa': 2.0, 'VRDeck': 7.0}
---------------------------
{'RoomService': 0.0, 'FoodCourt': 1421.0, 'ShoppingMall': 0.0, 'Spa': 445.0, 'VRDeck': 473.0}
---------------------------
{'RoomService': 853.0, 'FoodCourt': 0.0, 'ShoppingMall': 170.0, 'Spa': 0.0, 'VRDeck': 0.0}
---------------------------
{'RoomService': 0.0, 'FoodCourt': 1525.0, 'ShoppingMall': 0.0, 'Spa': 417.0, 'VRDeck': 1547.0}
---------------------------
{'RoomService': 788.0, 'FoodCourt': 0.0, 'ShoppingMall': 354.0, 'Spa': 4.0, 'VRDeck': 0.0}
---------------------------
{'RoomService': 3.0, 'FoodCourt': 4.0, 'ShoppingMall': 4.0, 'Spa': 6.0, 'VRDeck': 4.0}
---------------------------
{'RoomService': 0.0, 'FoodCourt': 1389.0, 'ShoppingMall': 0.0, 'Spa': 272.5, 'VRDeck': 438.0}
---------------------------
{'RoomService': 726.5, 'FoodCourt': 0.0, 'ShoppingMall': 268.0, 'Spa': 0.0, 'VRDeck': 0.0}
---------------------------
{'RoomService': 0.0, 'FoodCourt': 2343.0, 'ShoppingMall': 0.0, 'Spa': 269.0, 'VRDeck': 680.0}
---------------------------
{'RoomService': 687.5, 'FoodCourt': 0.0, 'ShoppingMall': 177.0, 'Spa': 0.0, 'VRDeck': 0.0}
---------------------------
def get_expenses_means(df):
not_child_not_sleep_df = df.loc[(X_train['Age'] >= 13) & (df['CryoSleep'] == False), :]
expenses_means = not_child_not_sleep_df.loc[:, cols].groupby(group_cols, as_index=False).agg('mean')
return expenses_means
def Expenses_update(df, median_values):
print('------ Expenses_update ------------')
values = {"RoomService": 0, "FoodCourt": 0, "ShoppingMall": 0, "Spa": 0, 'VRDeck': 0}
print('Nulls in Expenses:', df.loc[:, expenses_cols].isna().sum().sum(axis = 0))
df.loc[(df['Age'] < 13), :] = df.loc[(df['Age'] < 13), :].fillna(value = values)
print('Nulls in Expenses:', df.loc[:, expenses_cols].isna().sum().sum(axis = 0))
#-------------------------------------------------------------------------------
df.loc[(df['CryoSleep'].notnull())
& (df['CryoSleep']), :] = df.loc[(df['CryoSleep'].notnull()) & (df['CryoSleep']), :].fillna(value = values)
print('Nulls in Expenses:', df.loc[:, expenses_cols].isna().sum().sum(axis = 0))
#-------------------------------------------------------------------------
for index, row in median_values.iterrows():
cond = (df['IsSingle'] == row['IsSingle']) & (df['VIP'] == row['VIP']) & (df['HomePlanet'] == row['HomePlanet'])
tmp = df.loc[cond, expenses_cols]
print(tmp.shape)
df.loc[cond, :] = df.loc[cond, :].fillna(value = row[expenses_cols].to_dict())
print('Nulls in Expenses:', df.loc[:, expenses_cols].isna().sum().sum(axis = 0))
#-----------------------------------------------------------------------------
df.fillna(value= values, inplace=True)
print('Nulls in Expenses:', df.loc[:, expenses_cols].isna().sum().sum(axis = 0))
return df
expenses_means_train = get_expenses_means(X_train)
X_train = Expenses_update(X_train, expenses_means_train)
------ Expenses_update ------------ Nulls in Expenses: 755 Nulls in Expenses: 681 Nulls in Expenses: 426 (1177, 5) (923, 5) (615, 5) (64, 5) (21, 5) (2624, 5) (683, 5) (770, 5) (44, 5) (33, 5) Nulls in Expenses: 0 Nulls in Expenses: 0
def log_expenses(df):
print('------ log_expenses ------------')
num_features = ['RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck' ]
for f in num_features:
df[f] = np.log10(df[f] + 1)
return df
X_train = log_expenses(X_train)
------ log_expenses ------------
def New_features_update(df):
print('------ New_features_update ------------')
df['TotalSpend'] = df.loc[:, expenses_cols].sum(axis=1)
df['NoSpend'] = df.loc[:, 'TotalSpend'].apply(lambda x: np.nan if math.isnan(x) else x==0)
df['IsChild'] = df.loc[:, 'Age'] < 18
df['Route'] = df['HomePlanet'] + ' - ' + df['Destination']
return df
X_train = New_features_update(X_train)
------ New_features_update ------------
X_train.isnull().sum()
PassengerId 0 HomePlanet 0 CryoSleep 0 Cabin 157 Destination 0 Age 0 VIP 0 RoomService 0 FoodCourt 0 ShoppingMall 0 Spa 0 VRDeck 0 Name 95 FirstName 165 LastName 95 GroupId 0 NumInGroup 0 CabinDeck 0 CabinNum 0 CabinSide 0 IsChild 0 TotalSpend 0 GroupSize 0 IsSingle 0 namesakes_num_in_group 0 NameLength 0 NoSpend 0 Route 0 dtype: int64
X_train.drop(['Cabin', 'PassengerId', 'Name', 'FirstName', 'LastName'], axis=1, inplace=True)
X_train.describe()
| Age | RoomService | FoodCourt | ShoppingMall | Spa | VRDeck | GroupId | NumInGroup | CabinNum | TotalSpend | GroupSize | namesakes_num_in_group | NameLength | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 6954.000000 | 6954.000000 | 6954.000000 | 6954.000000 | 6954.000000 | 6954.000000 | 6954.000000 | 6954.000000 | 6954.000000 | 6954.000000 | 6954.000000 | 6954.000000 | 6954.000000 |
| mean | 28.704630 | 0.792112 | 0.853224 | 0.726385 | 0.818248 | 0.783067 | 4628.054357 | 1.522577 | 599.634978 | 3.973037 | 1.838079 | 0.669255 | 13.583405 |
| std | 14.320889 | 1.200016 | 1.277488 | 1.129154 | 1.205654 | 1.200326 | 2662.087544 | 1.064101 | 508.683487 | 3.802163 | 1.373543 | 1.152245 | 3.057250 |
| min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 |
| 25% | 19.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2308.000000 | 1.000000 | 170.000000 | 0.000000 | 1.000000 | 0.000000 | 12.000000 |
| 50% | 27.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 4643.000000 | 1.000000 | 428.500000 | 4.446009 | 1.000000 | 0.000000 | 14.000000 |
| 75% | 37.000000 | 1.819544 | 1.959041 | 1.518514 | 1.812913 | 1.724276 | 6864.000000 | 2.000000 | 998.000000 | 7.003803 | 2.000000 | 1.000000 | 16.000000 |
| max | 79.000000 | 3.996555 | 4.474420 | 4.370938 | 4.350422 | 4.382629 | 9280.000000 | 8.000000 | 1891.000000 | 17.251366 | 8.000000 | 6.000000 | 18.000000 |
X_train.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 6954 entries, 0 to 6953 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 HomePlanet 6954 non-null object 1 CryoSleep 6954 non-null bool 2 Destination 6954 non-null object 3 Age 6954 non-null float64 4 VIP 6954 non-null bool 5 RoomService 6954 non-null float64 6 FoodCourt 6954 non-null float64 7 ShoppingMall 6954 non-null float64 8 Spa 6954 non-null float64 9 VRDeck 6954 non-null float64 10 GroupId 6954 non-null float64 11 NumInGroup 6954 non-null float64 12 CabinDeck 6954 non-null object 13 CabinNum 6954 non-null float64 14 CabinSide 6954 non-null object 15 IsChild 6954 non-null bool 16 TotalSpend 6954 non-null float64 17 GroupSize 6954 non-null int64 18 IsSingle 6954 non-null bool 19 namesakes_num_in_group 6954 non-null float64 20 NameLength 6954 non-null float64 21 NoSpend 6954 non-null bool 22 Route 6954 non-null object dtypes: bool(5), float64(12), int64(1), object(5) memory usage: 1.3+ MB
def bool_to_int(df):
print('------ bool_to_int ------------')
df['CryoSleep'] = df["CryoSleep"].astype(int)
df['VIP'] = df["VIP"].astype(int)
df['IsChild'] = df["IsChild"].astype(int)
df['IsSingle'] = df["IsSingle"].astype(int)
df['NoSpend'] = df["NoSpend"].astype(int)
return df
X_train = bool_to_int(X_train)
------ bool_to_int ------------
# ---------------------------------------------------------------------------------------------
df_test = pd.read_csv("test.csv", sep=',', engine='python')
df_test
| PassengerId | HomePlanet | CryoSleep | Cabin | Destination | Age | VIP | RoomService | FoodCourt | ShoppingMall | Spa | VRDeck | Name | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0013_01 | Earth | True | G/3/S | TRAPPIST-1e | 27.0 | False | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | Nelly Carsoning |
| 1 | 0018_01 | Earth | False | F/4/S | TRAPPIST-1e | 19.0 | False | 0.0 | 9.0 | 0.0 | 2823.0 | 0.0 | Lerome Peckers |
| 2 | 0019_01 | Europa | True | C/0/S | 55 Cancri e | 31.0 | False | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | Sabih Unhearfus |
| 3 | 0021_01 | Europa | False | C/1/S | TRAPPIST-1e | 38.0 | False | 0.0 | 6652.0 | 0.0 | 181.0 | 585.0 | Meratz Caltilter |
| 4 | 0023_01 | Earth | False | F/5/S | TRAPPIST-1e | 20.0 | False | 10.0 | 0.0 | 635.0 | 0.0 | 0.0 | Brence Harperez |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4272 | 9266_02 | Earth | True | G/1496/S | TRAPPIST-1e | 34.0 | False | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | Jeron Peter |
| 4273 | 9269_01 | Earth | False | NaN | TRAPPIST-1e | 42.0 | False | 0.0 | 847.0 | 17.0 | 10.0 | 144.0 | Matty Scheron |
| 4274 | 9271_01 | Mars | True | D/296/P | 55 Cancri e | NaN | False | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | Jayrin Pore |
| 4275 | 9273_01 | Europa | False | D/297/P | NaN | NaN | False | 0.0 | 2680.0 | 0.0 | 0.0 | 523.0 | Kitakan Conale |
| 4276 | 9277_01 | Earth | True | G/1498/S | PSO J318.5-22 | 43.0 | False | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | Lilace Leonzaley |
4277 rows × 13 columns
df_test[['GroupId', 'NumInGroup']] = df_test['PassengerId'].str.split('_', expand=True)
df_test['GroupId'] = df_test.loc[:,'GroupId'].astype(float)
pd.merge(X_train, df_test, how ='inner', on =['GroupId', 'GroupId'])
| HomePlanet_x | CryoSleep_x | Destination_x | Age_x | VIP_x | RoomService_x | FoodCourt_x | ShoppingMall_x | Spa_x | VRDeck_x | ... | Destination_y | Age_y | VIP_y | RoomService_y | FoodCourt_y | ShoppingMall_y | Spa_y | VRDeck_y | Name | NumInGroup_y |
|---|
0 rows × 37 columns
def encode_data(df):
numeric_selector = make_column_selector(dtype_include=np.number)
numeric_columns = numeric_selector(X_train)
print(numeric_columns)
nominal_columns = ["HomePlanet", "Destination", 'CabinSide', 'CabinDeck', 'Route']
nominal_categories = [list(sorted(X_train[column].unique())) for column in nominal_columns]
print(nominal_categories)
nominal_encoder = OneHotEncoder(categories=nominal_categories, sparse=False)
nominal_encoder.fit(X_train.loc[:, nominal_columns])
X_train_nominal = nominal_encoder.transform(X_train.loc[:, nominal_columns])
X_train_nominal = pd.DataFrame(
X_train_nominal,
columns=nominal_encoder.get_feature_names(),
index=X_train.index
)
X_train = pd.concat((X_train_nominal, X_train.drop(columns=nominal_columns)), axis=1)
# numerische features skalieren
scaler = MinMaxScaler()
scaler.fit(X_train[numeric_columns])
X_train.loc[:, numeric_columns] = scaler.transform(X_train[numeric_columns])
# Box Plot erzeugen
plt.figure(figsize=(15, 8))
X_train.loc[:, numeric_columns].boxplot(rot=90)
numeric_selector = make_column_selector(dtype_include=np.number)
numeric_columns = numeric_selector(X_train)
numeric_columns
['CryoSleep', 'Age', 'VIP', 'RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck', 'GroupId', 'NumInGroup', 'CabinNum', 'IsChild', 'TotalSpend', 'GroupSize', 'IsSingle', 'namesakes_num_in_group', 'NameLength', 'NoSpend']
nominal_columns = ["HomePlanet", "Destination", 'CabinSide', 'CabinDeck', 'Route']
nominal_categories = [list(sorted(X_train[column].unique())) for column in nominal_columns]
print(nominal_categories)
nominal_encoder = OneHotEncoder(categories=nominal_categories, sparse=False)
nominal_encoder.fit(X_train.loc[:, nominal_columns])
X_train_nominal = nominal_encoder.transform(X_train.loc[:, nominal_columns])
X_train_nominal = pd.DataFrame(
X_train_nominal,
columns=nominal_encoder.get_feature_names(),
index=X_train.index
)
X_train_nominal
[['Earth', 'Europa', 'Mars'], ['55 Cancri e', 'PSO J318.5-22', 'TRAPPIST-1e'], ['O', 'P', 'S'], ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'T'], ['Earth - 55 Cancri e', 'Earth - PSO J318.5-22', 'Earth - TRAPPIST-1e', 'Europa - 55 Cancri e', 'Europa - PSO J318.5-22', 'Europa - TRAPPIST-1e', 'Mars - 55 Cancri e', 'Mars - PSO J318.5-22', 'Mars - TRAPPIST-1e']]
| x0_Earth | x0_Europa | x0_Mars | x1_55 Cancri e | x1_PSO J318.5-22 | x1_TRAPPIST-1e | x2_O | x2_P | x2_S | x3_A | ... | x3_T | x4_Earth - 55 Cancri e | x4_Earth - PSO J318.5-22 | x4_Earth - TRAPPIST-1e | x4_Europa - 55 Cancri e | x4_Europa - PSO J318.5-22 | x4_Europa - TRAPPIST-1e | x4_Mars - 55 Cancri e | x4_Mars - PSO J318.5-22 | x4_Mars - TRAPPIST-1e | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 1 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 3 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 4 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | ... | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 6949 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 6950 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 6951 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| 6952 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
| 6953 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
6954 rows × 26 columns
X_train = pd.concat((X_train_nominal, X_train.drop(columns=nominal_columns)), axis=1)
X_train.head()
| x0_Earth | x0_Europa | x0_Mars | x1_55 Cancri e | x1_PSO J318.5-22 | x1_TRAPPIST-1e | x2_O | x2_P | x2_S | x3_A | ... | GroupId | NumInGroup | CabinNum | IsChild | TotalSpend | GroupSize | IsSingle | namesakes_num_in_group | NameLength | NoSpend | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 6309.0 | 2.0 | 1023.0 | 1 | 0.000000 | 4 | 0 | 2.0 | 15.0 | 1 |
| 1 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 2908.0 | 2.0 | 553.0 | 1 | 7.545307 | 1 | 1 | 0.0 | 16.0 | 0 |
| 2 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 548.0 | 1.0 | 36.0 | 0 | 5.088738 | 1 | 1 | 0.0 | 14.0 | 0 |
| 3 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 8757.0 | 1.0 | 1409.0 | 0 | 6.575190 | 1 | 1 | 0.0 | 13.0 | 0 |
| 4 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | ... | 1644.0 | 1.0 | 327.0 | 1 | 3.377488 | 1 | 1 | 0.0 | 16.0 | 0 |
5 rows × 44 columns
# numeric_columns.append('CabinDeck')
# Box Plot erzeugen
plt.figure(figsize=(15, 8))
X_train[numeric_columns].boxplot(rot=90)
<AxesSubplot:>
# numerische features skalieren
scaler = MinMaxScaler()
scaler.fit(X_train[numeric_columns])
X_train.loc[:, numeric_columns] = scaler.transform(X_train[numeric_columns])
# Box Plot erzeugen
plt.figure(figsize=(15, 8))
X_train.loc[:, numeric_columns].boxplot(rot=90)
<AxesSubplot:>
X_train
| x0_Earth | x0_Europa | x0_Mars | x1_55 Cancri e | x1_PSO J318.5-22 | x1_TRAPPIST-1e | x2_O | x2_P | x2_S | x3_A | ... | GroupId | NumInGroup | CabinNum | IsChild | TotalSpend | GroupSize | IsSingle | namesakes_num_in_group | NameLength | NoSpend | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.679815 | 0.142857 | 0.540984 | 1.0 | 0.000000 | 0.428571 | 0.0 | 0.333333 | 0.833333 | 1.0 |
| 1 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.313288 | 0.142857 | 0.292438 | 1.0 | 0.437374 | 0.000000 | 1.0 | 0.000000 | 0.888889 | 0.0 |
| 2 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.058950 | 0.000000 | 0.019038 | 0.0 | 0.294976 | 0.000000 | 1.0 | 0.000000 | 0.777778 | 0.0 |
| 3 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.943636 | 0.000000 | 0.745108 | 0.0 | 0.381140 | 0.000000 | 1.0 | 0.000000 | 0.722222 | 0.0 |
| 4 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | ... | 0.177066 | 0.000000 | 0.172924 | 1.0 | 0.195781 | 0.000000 | 1.0 | 0.000000 | 0.888889 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 6949 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.155620 | 0.000000 | 0.116869 | 1.0 | 0.000000 | 0.000000 | 1.0 | 0.000000 | 0.833333 | 1.0 |
| 6950 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | ... | 0.106261 | 0.000000 | 0.107879 | 1.0 | 0.406790 | 0.000000 | 1.0 | 0.000000 | 0.777778 | 0.0 |
| 6951 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.494773 | 0.285714 | 0.455843 | 0.0 | 0.000000 | 0.428571 | 0.0 | 0.500000 | 0.611111 | 1.0 |
| 6952 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.650393 | 0.142857 | 0.118985 | 1.0 | 0.497925 | 0.285714 | 0.0 | 0.333333 | 0.888889 | 0.0 |
| 6953 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | ... | 0.061429 | 0.000000 | 0.011105 | 0.0 | 0.000000 | 0.571429 | 0.0 | 0.666667 | 0.777778 | 1.0 |
6954 rows × 44 columns
X_train.dtypes
x0_Earth float64 x0_Europa float64 x0_Mars float64 x1_55 Cancri e float64 x1_PSO J318.5-22 float64 x1_TRAPPIST-1e float64 x2_O float64 x2_P float64 x2_S float64 x3_A float64 x3_B float64 x3_C float64 x3_D float64 x3_E float64 x3_F float64 x3_G float64 x3_T float64 x4_Earth - 55 Cancri e float64 x4_Earth - PSO J318.5-22 float64 x4_Earth - TRAPPIST-1e float64 x4_Europa - 55 Cancri e float64 x4_Europa - PSO J318.5-22 float64 x4_Europa - TRAPPIST-1e float64 x4_Mars - 55 Cancri e float64 x4_Mars - PSO J318.5-22 float64 x4_Mars - TRAPPIST-1e float64 CryoSleep float64 Age float64 VIP float64 RoomService float64 FoodCourt float64 ShoppingMall float64 Spa float64 VRDeck float64 GroupId float64 NumInGroup float64 CabinNum float64 IsChild float64 TotalSpend float64 GroupSize float64 IsSingle float64 namesakes_num_in_group float64 NameLength float64 NoSpend float64 dtype: object
X_train.memory_usage(deep= True)
Index 319864 x0_Earth 55632 x0_Europa 55632 x0_Mars 55632 x1_55 Cancri e 55632 x1_PSO J318.5-22 55632 x1_TRAPPIST-1e 55632 x2_O 55632 x2_P 55632 x2_S 55632 x3_A 55632 x3_B 55632 x3_C 55632 x3_D 55632 x3_E 55632 x3_F 55632 x3_G 55632 x3_T 55632 x4_Earth - 55 Cancri e 55632 x4_Earth - PSO J318.5-22 55632 x4_Earth - TRAPPIST-1e 55632 x4_Europa - 55 Cancri e 55632 x4_Europa - PSO J318.5-22 55632 x4_Europa - TRAPPIST-1e 55632 x4_Mars - 55 Cancri e 55632 x4_Mars - PSO J318.5-22 55632 x4_Mars - TRAPPIST-1e 55632 CryoSleep 55632 Age 55632 VIP 55632 RoomService 55632 FoodCourt 55632 ShoppingMall 55632 Spa 55632 VRDeck 55632 GroupId 55632 NumInGroup 55632 CabinNum 55632 IsChild 55632 TotalSpend 55632 GroupSize 55632 IsSingle 55632 namesakes_num_in_group 55632 NameLength 55632 NoSpend 55632 dtype: int64
def memory_optimiz(df):
print('------ memory_optimiz ------------')
df_opti = df.apply(pd.to_numeric, downcast="float")
reduction = (df.memory_usage(deep=True).sum()-df_opti.memory_usage(deep=True).sum())/df.memory_usage(deep=True).sum() * 100
print(f"Reduction = {reduction:0.2f}%")
return df_opti
X_train = memory_optimiz(X_train)
------ memory_optimiz ------------ Reduction = 44.22%
X_train_tmp = X_train.copy()
X_train_tmp['Transported'] = y_train
corrmat = X_train_tmp.corr()
fig = plt.figure(figsize=(25, 25))
sns.heatmap(corrmat, vmax=1, vmin=-1, square=True, cmap="seismic", annot=True)
<AxesSubplot:>
def data_transform(df, source_data, *args):
age_medians = []
expenses_means = []
df = new_features_create(df)
df = HomePlanet_update(df, source_data)
df= Destination_update(df)
df = LastName_update(df)
df = CabinDeck_update(df)
null_cabin_num_ind, df = CabinNum_update(df, source_data)
df = CabinSide_update(df, source_data)
df = VIP_update(df)
df = CryoSleep_update(df)
if len(args) == 0:
age_medians = get_age_medians(df)
expenses_means = get_expenses_means(df)
else:
age_medians = args[0]
expenses_means = args[1]
df = Age_update(df, age_medians)
df = Expenses_update(df, expenses_means)
df = log_expenses(df)
df = New_features_update(df)
df.drop(['Cabin', 'PassengerId', 'Name', 'FirstName', 'LastName'], axis=1, inplace=True)
df = bool_to_int(df)
return df, age_medians, expenses_means
def encode_data(df, nominal_encoder, scaler):
df_nominal = nominal_encoder.transform(df.loc[:, nominal_columns])
df_nominal = pd.DataFrame(
df_nominal,
columns=nominal_encoder.get_feature_names(),
index=df.index
)
df_nominal
df = pd.concat((df_nominal, df.drop(columns=nominal_columns)), axis=1)
df.loc[:, numeric_columns] = scaler.transform(df[numeric_columns])
return df
X_test, am, em = data_transform(X_test, for_visual_df, age_medians_train, expenses_means_train)
X_test = encode_data(X_test, nominal_encoder, scaler)
X_test = memory_optimiz(X_test)
------ HomePlanet_update ------------ HomePlanet NULLs: 35 1. (after replacement through the GroupId) HomePlanet NULLs: 17 2. (after replacement through the Deck) HomePlanet NULLs: 9 3. (after replacement through the LastName) HomePlanet NULLs: 1 4. (after replacement through the Destination): 0 5. (after replacement by the most common value) HomePlanet NULLs: 0 ------ Destination_update ------------ Destination NULLs: 29 (after replacement by the most common value) Destination NULLs: 0 ------ LastName_update ------------ LastName NULLs: 35 (1554, 2) (after update) LastName NULLs: 29 namesakes_num_in_group NULLs: 29 namesakes_num_in_group NULLs: 0 ------ CabinDeck_update ------------ CabinDeck NULLs: 42 (after update throw GroupId) CabinDeck NULLs: 30 (after update throw HomePlanet) CabinDeck NULLs: 0 ------ CabinNum_update ------------ CabinNum NULLs: 42 ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'T'] B (10, 1) F (8, 1) G (24, 1) CabinNum NULLs: 0 ------ CabinSide_update ------------ CabinSide NULLs: 42 CabinSide NULLs: 18 CabinSide NULLs: 0 ------ VIP_update ------------ VIP NULLs: 37 VIP NULLs: 0 ------ CryoSleep_update ------------ 52 29 0 ------ Age_update ------------ Nulls in Expenses: 25 Nulls in Expenses: 0 Nulls in Expenses: 0 ------ Expenses_update ------------ Nulls in Expenses: 188 Nulls in Expenses: 172 Nulls in Expenses: 111 (121, 5) (103, 5) (58, 5) (7, 5) (0, 5) (792, 5) (332, 5) (296, 5) (18, 5) (12, 5) Nulls in Expenses: 0 Nulls in Expenses: 0 ------ log_expenses ------------ ------ New_features_update ------------ ------ bool_to_int ------------ ------ memory_optimiz ------------ Reduction = 48.89%
data = pd.read_csv("train.csv", sep=',', engine='python')
X, age_medians_all, expenses_means_all = data_transform(data.drop(columns="Transported"), for_visual_df)
y = data["Transported"].astype(int)
------ HomePlanet_update ------------ HomePlanet NULLs: 201 1. (after replacement through the GroupId) HomePlanet NULLs: 111 2. (after replacement through the Deck) HomePlanet NULLs: 63 3. (after replacement through the LastName) HomePlanet NULLs: 8 4. (after replacement through the Destination): 0 5. (after replacement by the most common value) HomePlanet NULLs: 0 ------ Destination_update ------------ Destination NULLs: 182 (after replacement by the most common value) Destination NULLs: 0 ------ LastName_update ------------ LastName NULLs: 200 (6113, 2) (after update) LastName NULLs: 104 namesakes_num_in_group NULLs: 104 namesakes_num_in_group NULLs: 0 ------ CabinDeck_update ------------ CabinDeck NULLs: 199 (after update throw GroupId) CabinDeck NULLs: 99 (after update throw HomePlanet) CabinDeck NULLs: 0 ------ CabinNum_update ------------ CabinNum NULLs: 199 ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'T'] A (6, 1) B (38, 1) C (14, 1) D (9, 1) E (8, 1) F (36, 1) G (88, 1) CabinNum NULLs: 0 ------ CabinSide_update ------------ CabinSide NULLs: 199 CabinSide NULLs: 99 CabinSide NULLs: 0 ------ VIP_update ------------ VIP NULLs: 203 VIP NULLs: 0 ------ CryoSleep_update ------------ 217 130 0 ------ Age_update ------------ Nulls in Expenses: 179 Nulls in Expenses: 0 Nulls in Expenses: 0 ------ Expenses_update ------------ Nulls in Expenses: 943 Nulls in Expenses: 853 Nulls in Expenses: 537 Nulls in Expenses: 537 Nulls in Expenses: 0 ------ log_expenses ------------ ------ New_features_update ------------ ------ bool_to_int ------------
def encode_all_data(df):
numeric_selector = make_column_selector(dtype_include=np.number)
numeric_columns = numeric_selector(df)
print(numeric_columns)
nominal_columns = ["HomePlanet", "Destination", 'CabinSide', 'CabinDeck', 'Route']
nominal_categories = [list(sorted(df[column].unique())) for column in nominal_columns]
print(nominal_categories)
nominal_encoder = OneHotEncoder(categories=nominal_categories, sparse=False)
nominal_encoder.fit(df.loc[:, nominal_columns])
df_nominal = nominal_encoder.transform(df.loc[:, nominal_columns])
df_nominal = pd.DataFrame(
df_nominal,
columns=nominal_encoder.get_feature_names(),
index=df.index
)
df = pd.concat((df_nominal, df.drop(columns=nominal_columns)), axis=1)
# numerische features skalieren
scaler = MinMaxScaler()
scaler.fit(df[numeric_columns])
df.loc[:, numeric_columns] = scaler.transform(df[numeric_columns])
return df, nominal_encoder, scaler
X, nominal_encoder_all, scaler_all = encode_all_data(X)
X = memory_optimiz(X)
['CryoSleep', 'Age', 'VIP', 'RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck', 'GroupId', 'NumInGroup', 'CabinNum', 'IsChild', 'TotalSpend', 'GroupSize', 'IsSingle', 'namesakes_num_in_group', 'NameLength', 'NoSpend'] [['Earth', 'Europa', 'Mars'], ['55 Cancri e', 'PSO J318.5-22', 'TRAPPIST-1e'], ['O', 'P', 'S'], ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'T'], ['Earth - 55 Cancri e', 'Earth - PSO J318.5-22', 'Earth - TRAPPIST-1e', 'Europa - 55 Cancri e', 'Europa - PSO J318.5-22', 'Europa - TRAPPIST-1e', 'Mars - 55 Cancri e', 'Mars - PSO J318.5-22', 'Mars - TRAPPIST-1e']] ------ memory_optimiz ------------ Reduction = 45.08%
roc_curves = {}
model_acc = {}
def get_roc_curve(y_test, y_test_pred_prob):
fpr, tpr, thresholds = roc_curve(y_test, y_test_pred_prob)
rel = {}
for i in range(0, len(fpr)):
rel[i] = tpr[i] + (1 - fpr[i])
best_threshold_ind = max(rel, key=rel.get)
print('tpr =', tpr[best_threshold_ind], 'fpr =', fpr[best_threshold_ind], 'threshold =', thresholds[best_threshold_ind] )
return fpr, tpr, best_threshold_ind, thresholds[best_threshold_ind]
def plot_roc_curve(fpr, tpr, best_threshold_ind):
fig, axs = plt.subplots(figsize=(10, 6))
plt.hlines(y=tpr[best_threshold_ind], xmin=0, xmax=fpr[best_threshold_ind], linestyles='--')
plt.vlines(x=fpr[best_threshold_ind], ymin=0, ymax=tpr[best_threshold_ind], linestyles='--')
plt.plot([0, 1], [0, 1], linestyle='dashed', color='red')
plt.plot(fpr, tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC curve')
param_grid = {
'penalty': ['l1','l2'],
'C': [0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 2],
'max_iter': [50, 100, 150]
}
logisticRegression_gridsearch_hp_tuning = GridSearchCV(
LogisticRegression(),
param_grid=param_grid,
# scoring="accuracy",
n_jobs=-1,
verbose = 4
)
logisticRegression_gridsearch_hp_tuning.fit(X_train, y_train)
print(logisticRegression_gridsearch_hp_tuning.best_params_)
print("best_score = ", logisticRegression_gridsearch_hp_tuning.best_score_)
print('acc X_train = ', logisticRegression_gridsearch_hp_tuning.score(X_train, y_train))
Fitting 5 folds for each of 48 candidates, totalling 240 fits
C:\Users\natalie\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:918: UserWarning: One or more of the test scores are non-finite: [ nan 0.77149921 nan 0.77149921 nan 0.77149921
nan 0.77538224 nan 0.77538224 nan 0.77538224
nan 0.77595726 nan 0.77624503 nan 0.77624503
nan 0.77610083 nan 0.7763886 nan 0.77653238
nan 0.77552571 nan 0.77624482 nan 0.77624482
nan 0.77595705 nan 0.77624472 nan 0.7763886
nan 0.77595716 nan 0.77595716 nan 0.77595716
nan 0.77581327 nan 0.77610104 nan 0.77581337]
warnings.warn(
{'C': 0.75, 'max_iter': 150, 'penalty': 'l2'}
best_score = 0.7765323844447087
acc X_train = 0.7795513373597929
lg_reg_mod = LogisticRegression(C = 0.75, max_iter = 150, penalty = 'l2').fit(X_train, y_train)
print('Acc Train =', round(lg_reg_mod.score(X_train, y_train) * 100, 2))
print('Acc Test =', round(lg_reg_mod.score(X_test, y_test) * 100, 2))
model_acc['LogisticRegression'] = (round(lg_reg_mod.score(X_test, y_test) * 100, 2), lg_reg_mod)
y_test_pred_prob = pd.DataFrame(lg_reg_mod.predict_proba(X_test)[:, 1])
#y_test_pred_prob
Acc Train = 77.96 Acc Test = 78.32
fpr, tpr, best_threshold_ind, best_threshold = get_roc_curve(y_test, y_test_pred_prob)
roc_curves['LogisticRegression'] = (fpr, tpr, best_threshold_ind)
plot_roc_curve(fpr, tpr, best_threshold_ind)
tpr = 0.7886178861788617 fpr = 0.2152619589977221 threshold = 0.5435228456093995
# pr, re, thres = precision_recall_curve( y_test, y_test_pred_prob)
# fig, axs = plt.subplots(figsize=(12, 6))
# plt.plot(re, pr)
# plt.xlabel('recall')
# plt.ylabel('precision')
#y_test_pred_prob
y_test_pred = y_test_pred_prob.applymap(lambda x: 1 if x > best_threshold else 0)
y_test_pred
print('ACC with threshold', round(best_threshold, 3), '=', round(accuracy_score(y_test, y_test_pred) *100, 5))
ACC with threshold 0.544 = 78.6084
param_grid = {
'max_depth': [3, 5, 10],
'min_samples_leaf': [1, 3, 5, 8]
}
decisionTreeClassifier_gridsearch_hp_tuning = GridSearchCV(
DecisionTreeClassifier(),
param_grid=param_grid,
# scoring="accuracy",
n_jobs=-1,
verbose = 4
)
decisionTreeClassifier_gridsearch_hp_tuning.fit(X_train, y_train)
print(decisionTreeClassifier_gridsearch_hp_tuning.best_params_)
print("best_score = ", decisionTreeClassifier_gridsearch_hp_tuning.best_score_)
print('acc X_train = ', decisionTreeClassifier_gridsearch_hp_tuning.score(X_train, y_train))
Fitting 5 folds for each of 12 candidates, totalling 60 fits
{'max_depth': 5, 'min_samples_leaf': 3}
best_score = 0.780416242132103
acc X_train = 0.7985332182916307
dt_mod = DecisionTreeClassifier(max_depth = 5, min_samples_leaf = 3).fit(X_train, y_train)
print('Acc Train =', round(dt_mod.score(X_train, y_train) * 100, 2))
print('Acc Test =', round(dt_mod.score(X_test, y_test) * 100, 2))
model_acc['DecisionTreeClassifier'] = (round(dt_mod.score(X_test, y_test) * 100, 2), dt_mod)
y_test_pred_prob = pd.DataFrame(dt_mod.predict_proba(X_test)[:, 1])
Acc Train = 79.85 Acc Test = 78.55
fpr, tpr, best_threshold_ind, best_threshold = get_roc_curve(y_test, y_test_pred_prob)
roc_curves['DecisionTreeClassifier'] = (fpr, tpr, best_threshold_ind)
plot_roc_curve(fpr, tpr, best_threshold_ind)
tpr = 0.8130081300813008 fpr = 0.22209567198177677 threshold = 0.5393258426966292
y_test_pred = y_test_pred_prob.applymap(lambda x: 1 if x > best_threshold else 0)
y_test_pred
print('ACC with threshold', round(best_threshold, 3), '=', round(accuracy_score(y_test, y_test_pred) *100, 5))
ACC with threshold 0.539 = 78.6084
param_grid = {
'max_depth': [3, 5, 10],
'min_samples_leaf': [1, 3, 5, 8],
'n_estimators': [50, 100, 150, 200, 250, 300]
}
randomForestClassifier_gridsearch_hp_tuning = GridSearchCV(
RandomForestClassifier(),
param_grid=param_grid,
n_jobs=-1,
verbose = 4
)
randomForestClassifier_gridsearch_hp_tuning.fit(X_train, y_train)
print(randomForestClassifier_gridsearch_hp_tuning.best_params_)
print("best_score = ", randomForestClassifier_gridsearch_hp_tuning.best_score_)
print('acc X_train = ', randomForestClassifier_gridsearch_hp_tuning.score(X_train, y_train))
Fitting 5 folds for each of 72 candidates, totalling 360 fits
{'max_depth': 10, 'min_samples_leaf': 1, 'n_estimators': 250}
best_score = 0.8012674490170625
acc X_train = 0.8809318377911993
rf_mod = RandomForestClassifier(max_depth=10, min_samples_leaf=1, n_estimators = 250)
rf_mod.fit(X_train, y_train)
print('Acc Train =', round(rf_mod.score(X_train, y_train) * 100, 2))
print('Acc Test =', round(rf_mod.score(X_test, y_test) * 100, 2))
model_acc['RandomForestClassifier'] = (round(rf_mod.score(X_test, y_test) * 100, 2), rf_mod)
d = {'Features': X_train.columns, 'importance': rf_mod.feature_importances_*100}
features_imp_df = pd.DataFrame(data=d).sort_values('importance', ascending=False)
features_imp_df
Acc Train = 87.92 Acc Test = 80.85
| Features | importance | |
|---|---|---|
| 38 | TotalSpend | 13.423982 |
| 43 | NoSpend | 8.889983 |
| 30 | FoodCourt | 7.665911 |
| 32 | Spa | 7.534157 |
| 26 | CryoSleep | 7.199363 |
| 29 | RoomService | 7.194973 |
| 33 | VRDeck | 6.907424 |
| 31 | ShoppingMall | 5.928310 |
| 36 | CabinNum | 4.457665 |
| 34 | GroupId | 4.252145 |
| 27 | Age | 3.387283 |
| 0 | x0_Earth | 2.278378 |
| 42 | NameLength | 2.105918 |
| 1 | x0_Europa | 1.859197 |
| 13 | x3_E | 1.531294 |
| 19 | x4_Earth - TRAPPIST-1e | 1.347522 |
| 15 | x3_G | 1.231052 |
| 14 | x3_F | 1.170829 |
| 7 | x2_P | 1.124561 |
| 8 | x2_S | 1.124489 |
| 39 | GroupSize | 1.011184 |
| 41 | namesakes_num_in_group | 0.966701 |
| 35 | NumInGroup | 0.952475 |
| 2 | x0_Mars | 0.678656 |
| 25 | x4_Mars - TRAPPIST-1e | 0.596319 |
| 37 | IsChild | 0.548211 |
| 11 | x3_C | 0.532934 |
| 20 | x4_Europa - 55 Cancri e | 0.508754 |
| 10 | x3_B | 0.480592 |
| 22 | x4_Europa - TRAPPIST-1e | 0.450021 |
| 5 | x1_TRAPPIST-1e | 0.445491 |
| 40 | IsSingle | 0.379511 |
| 3 | x1_55 Cancri e | 0.337420 |
| 4 | x1_PSO J318.5-22 | 0.322843 |
| 18 | x4_Earth - PSO J318.5-22 | 0.242510 |
| 17 | x4_Earth - 55 Cancri e | 0.213325 |
| 12 | x3_D | 0.213291 |
| 9 | x3_A | 0.136160 |
| 28 | VIP | 0.125256 |
| 6 | x2_O | 0.085157 |
| 23 | x4_Mars - 55 Cancri e | 0.078498 |
| 24 | x4_Mars - PSO J318.5-22 | 0.066421 |
| 21 | x4_Europa - PSO J318.5-22 | 0.011114 |
| 16 | x3_T | 0.002720 |
y_test_pred_prob = pd.DataFrame(rf_mod.predict_proba(X_test)[:, 1])
fpr, tpr, best_threshold_ind, best_threshold = get_roc_curve(y_test, y_test_pred_prob)
roc_curves['RandomForestClassifier'] = (fpr, tpr, best_threshold_ind)
plot_roc_curve(fpr, tpr, best_threshold_ind)
tpr = 0.7921022067363531 fpr = 0.1662870159453303 threshold = 0.5199264252758352
y_test_pred = y_test_pred_prob.applymap(lambda x: 1 if x > best_threshold else 0)
y_test_pred
print('ACC with threshold', round(best_threshold, 3), '=', round(accuracy_score(y_test, y_test_pred) *100, 5))
ACC with threshold 0.52 = 81.25359
# min_samples_leaf=[3, 5, 8]
param_grid = {
"learning_rate": [0.01, 0.05, 0.1, 0.2, 0.5],
"n_estimators": [50, 100, 150, 200, 250],
'max_depth' : [3, 5, 8],
'min_samples_leaf' : [1, 3, 5, 8]
}
gradientBoostingClassifier_gridsearch_hp_tuning = GridSearchCV(
GradientBoostingClassifier(),
param_grid=param_grid,
# scoring="accuracy",
n_jobs=-1,
verbose = 2
)
gradientBoostingClassifier_gridsearch_hp_tuning.fit(X_train, y_train)
print(gradientBoostingClassifier_gridsearch_hp_tuning.best_params_)
print("best_score = ", gradientBoostingClassifier_gridsearch_hp_tuning.best_score_)
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
{'learning_rate': 0.1, 'max_depth': 3, 'min_samples_leaf': 5, 'n_estimators': 200}
best_score = 0.8065877765077657
gb_mod = GradientBoostingClassifier(n_estimators=200, learning_rate=0.1, max_depth=3, min_samples_leaf=5)
gb_mod.fit(X_train, y_train)
print('Acc Train =', round(gb_mod.score(X_train, y_train) * 100, 2))
print('Acc Test =', round(gb_mod.score(X_test, y_test) * 100, 2))
model_acc['GradientBoostingClassifier'] = (round(gb_mod.score(X_test, y_test) * 100, 2), gb_mod)
Acc Train = 85.1 Acc Test = 81.14
y_test_pred_prob = pd.DataFrame(gb_mod.predict_proba(X_test)[:, 1])
fpr, tpr, best_threshold_ind, best_threshold = get_roc_curve(y_test, y_test_pred_prob)
roc_curves['GradientBoostingClassifier'] = (fpr, tpr, best_threshold_ind)
plot_roc_curve(fpr, tpr, best_threshold_ind)
tpr = 0.8141695702671312 fpr = 0.18337129840546698 threshold = 0.5201111213823948
y_test_pred = y_test_pred_prob.applymap(lambda x: 1 if x > best_threshold else 0)
y_test_pred
print('ACC with threshold', round(best_threshold, 3), '=', round(accuracy_score(y_test, y_test_pred) *100, 5))
ACC with threshold 0.52 = 81.48361
# gb_mod_all = GradientBoostingClassifier(n_estimators=100, learning_rate=0.05, max_depth=5, min_samples_leaf=3)
# gb_mod_all.fit(X, y)
# print('Acc Train =', round(gb_mod_all.score(X, y) * 100, 2))
# y_pred_prob = pd.DataFrame(gb_mod_all.predict_proba(X)[:, 1])
# best_threshold = get_roc_curve (y, y_pred_prob)
pca_model = PCA()
pca_model.fit(X)
pca_model.explained_variance_
array([8.01253617e-01, 7.08186507e-01, 5.50042152e-01, 4.92000550e-01,
4.18913543e-01, 2.88391322e-01, 2.11822823e-01, 1.95599705e-01,
1.31432563e-01, 1.21141076e-01, 1.05553843e-01, 1.01045758e-01,
9.06879306e-02, 6.78450093e-02, 5.48487343e-02, 4.73572724e-02,
4.51527983e-02, 4.16213982e-02, 4.07905392e-02, 3.78172956e-02,
3.68256941e-02, 3.44146900e-02, 2.15689056e-02, 1.77277718e-02,
1.77043695e-02, 1.63565874e-02, 1.42079741e-02, 1.26515273e-02,
1.13545330e-02, 8.96807853e-03, 5.73974941e-03, 4.39572055e-03,
3.51034175e-03, 6.49402908e-04, 9.42942285e-14, 2.51186384e-14,
1.56139309e-14, 7.41888969e-15, 4.78846487e-15, 1.73407730e-15,
1.73407730e-15, 1.73407730e-15, 1.73407730e-15, 1.17523376e-15],
dtype=float32)
plt.figure(figsize=(8,6))
plt.plot(np.cumsum(pca_model.explained_variance_ratio_))
plt.xlabel("PC Components")
plt.ylabel("Explained Variance")
Text(0, 0.5, 'Explained Variance')
pca_model_90 = PCA(n_components=2)
pca_model_90.fit(X)
X_train_auto_reduced = pca_model_90.transform(X)
X_train_auto_reduced.shape
(8693, 2)
plt.figure(figsize=(10,8))
plt.scatter(X_train_auto_reduced[:, 0], X_train_auto_reduced[:, 1], c=y)
plt.xlabel("Component 0")
plt.ylabel("Component 1")
Text(0, 0.5, 'Component 1')
# X_train_auto_reduced = pd.DataFrame(X_train_auto_reduced)
# X_pca = pd.concat((X, X_train_auto_reduced), axis=1)
# X_pca
# pca_model_90.fit(X_test)
# X_test_auto_reduced = pca_model_90.transform(X_test)
# X_test_auto_reduced.shape
# X_test_auto_reduced = pd.DataFrame(X_test_auto_reduced)
# X_test_pca = pd.concat((X_test, X_test_auto_reduced), axis=1)
# X_test_pca
# gb_with_pca_mod = GradientBoostingClassifier(n_estimators=200, learning_rate=0.1, max_depth=4, min_samples_leaf=3)
# gb_with_pca_mod.fit(X_pca, y)
# print('Acc Train =', round(gb_with_pca_mod.score(X_pca, y) * 100, 2))
# print('Acc Test =', round(gb_with_pca_mod.score(X_test_pca, y_test) * 100, 2))
param_grid = {
'C': [0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5],
'kernel': ['linear', 'rbf', 'poly', 'sigmoid'],
'gamma': ['scale', 'auto']
}
SVC_gridsearch_hp_tuning = GridSearchCV(
SVC(),
param_grid=param_grid,
# scoring="accuracy",
n_jobs=-1,
verbose = 2
)
SVC_gridsearch_hp_tuning.fit(X_train, y_train)
print(SVC_gridsearch_hp_tuning.best_params_)
print("best_score = ", SVC_gridsearch_hp_tuning.best_score_)
Fitting 5 folds for each of 56 candidates, totalling 280 fits
{'C': 1.5, 'gamma': 'scale', 'kernel': 'poly'}
best_score = 0.7935010783608915
svc_mod = SVC(C = 1.5, gamma = 'scale', kernel = 'poly', probability = True)
svc_mod.fit(X_train, y_train)
print('Acc Train =', round(svc_mod.score(X_train, y_train) * 100, 2))
print('Acc Test =', round(svc_mod.score(X_test, y_test) * 100, 2))
model_acc['SVC'] = (round(svc_mod.score(X_test, y_test) * 100, 2), svc_mod)
Acc Train = 81.85 Acc Test = 79.64
y_test_pred_prob = pd.DataFrame(svc_mod.predict_proba(X_test)[:, 1])
fpr, tpr, best_threshold_ind, best_threshold = get_roc_curve(y_test, y_test_pred_prob)
roc_curves['SVC'] = (fpr, tpr, best_threshold_ind)
plot_roc_curve(fpr, tpr, best_threshold_ind)
tpr = 0.8385598141695703 fpr = 0.23462414578587698 threshold = 0.456507117667376
y_test_pred = y_test_pred_prob.applymap(lambda x: 1 if x > best_threshold else 0)
y_test_pred
print('ACC with threshold', round(best_threshold, 3), '=', round(accuracy_score(y_test, y_test_pred) *100, 5))
ACC with threshold 0.457 = 80.10351
param_grid = {
'n_neighbors': [2, 3, 5, 8, 10, 12],
'p': [1, 2]
}
kNeighborsClassifier_gridsearch_hp_tuning = GridSearchCV(
KNeighborsClassifier(),
param_grid=param_grid,
# scoring="accuracy",
n_jobs=-1,
verbose = 2
)
kNeighborsClassifier_gridsearch_hp_tuning.fit(X_train, y_train)
print(kNeighborsClassifier_gridsearch_hp_tuning.best_params_)
print("best_score = ", kNeighborsClassifier_gridsearch_hp_tuning.best_score_)
Fitting 5 folds for each of 12 candidates, totalling 60 fits
{'n_neighbors': 12, 'p': 2}
best_score = 0.7622956415600806
kn_mod = KNeighborsClassifier(n_neighbors=12, p=2)
kn_mod.fit(X_train, y_train)
print('Acc Train =', round(kn_mod.score(X_train, y_train) * 100, 2))
print('Acc Test =', round(kn_mod.score(X_test, y_test) * 100, 2))
model_acc['KNeighborsClassifier'] = (round(kn_mod.score(X_test, y_test) * 100, 2), kn_mod)
Acc Train = 80.21 Acc Test = 76.6
y_test_pred_prob = pd.DataFrame(kn_mod.predict_proba(X_test)[:, 1])
fpr, tpr, best_threshold_ind, best_threshold = get_roc_curve(y_test, y_test_pred_prob)
roc_curves['KNeighborsClassifier'] = (fpr, tpr, best_threshold_ind)
plot_roc_curve(fpr, tpr, best_threshold_ind)
tpr = 0.6968641114982579 fpr = 0.1662870159453303 threshold = 0.5833333333333334
y_test_pred = y_test_pred_prob.applymap(lambda x: 1 if x > best_threshold else 0)
y_test_pred
print('ACC with threshold', round(best_threshold, 3), '=', round(accuracy_score(y_test, y_test_pred) *100, 5))
ACC with threshold 0.583 = 76.59574
param_grid = {
"learning_rate": [0.01, 0.05, 0.1, 0.2, 0.5],
"n_estimators": [50, 100, 150, 200, 250],
'max_depth' : [3, 5, 8]
}
XGBClassifier_gridsearch_hp_tuning = GridSearchCV(
XGBClassifier(),
param_grid=param_grid,
# scoring="accuracy",
n_jobs=-1,
verbose = 2
)
XGBClassifier_gridsearch_hp_tuning.fit(X_train, y_train)
print(XGBClassifier_gridsearch_hp_tuning.best_params_)
print("best_score = ", XGBClassifier_gridsearch_hp_tuning.best_score_)
Fitting 5 folds for each of 75 candidates, totalling 375 fits
{'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 250}
best_score = 0.8065878799476595
xgb_mod = XGBClassifier(n_estimators=250, learning_rate=0.1, max_depth=3)
xgb_mod.fit(X_train, y_train)
print('Acc Train =', round(xgb_mod.score(X_train, y_train) * 100, 2))
print('Acc Test =', round(xgb_mod.score(X_test, y_test) * 100, 2))
model_acc['XGBClassifier'] = (round(xgb_mod.score(X_test, y_test) * 100, 2), xgb_mod)
Acc Train = 85.09 Acc Test = 80.56
y_test_pred_prob = pd.DataFrame(xgb_mod.predict_proba(X_test)[:, 1])
fpr, tpr, best_threshold_ind, best_threshold = get_roc_curve(y_test, y_test_pred_prob)
roc_curves['XGBClassifier'] = (fpr, tpr, best_threshold_ind)
plot_roc_curve(fpr, tpr, best_threshold_ind)
tpr = 0.8176538908246226 fpr = 0.1958997722095672 threshold = 0.51270777
y_test_pred = y_test_pred_prob.applymap(lambda x: 1 if x > best_threshold else 0)
y_test_pred
print('ACC with threshold', round(best_threshold, 3), '=', round(accuracy_score(y_test, y_test_pred) *100, 5))
ACC with threshold 0.513 = 81.02358
# xgb_mod = XGBClassifier(n_estimators=100, learning_rate=0.1, max_depth=5)
# xgb_mod.fit(X, y)
# print('Acc Train =', round(xgb_mod.score(X, y) * 100, 2))
param_grid = {
"learning_rate": [0.01, 0.05, 0.1, 0.5],
"n_estimators": [100, 200, 300, 400, 500],
'max_depth' : [3, 5, 8]
}
catBoostClassifier_gridsearch_hp_tuning = GridSearchCV(
CatBoostClassifier(),
param_grid=param_grid,
n_jobs=-1,
verbose = 2
)
catBoostClassifier_gridsearch_hp_tuning.fit(X_train, y_train)
print(catBoostClassifier_gridsearch_hp_tuning.best_params_)
print("best_score = ", catBoostClassifier_gridsearch_hp_tuning.best_score_)
Fitting 5 folds for each of 60 candidates, totalling 300 fits
0: learn: 0.6715886 total: 147ms remaining: 1m 13s
1: learn: 0.6497685 total: 152ms remaining: 37.8s
2: learn: 0.6313424 total: 155ms remaining: 25.8s
3: learn: 0.6165955 total: 159ms remaining: 19.8s
4: learn: 0.6047565 total: 163ms remaining: 16.2s
5: learn: 0.5901331 total: 167ms remaining: 13.8s
6: learn: 0.5783931 total: 171ms remaining: 12s
7: learn: 0.5692709 total: 174ms remaining: 10.7s
8: learn: 0.5601853 total: 178ms remaining: 9.73s
9: learn: 0.5516509 total: 182ms remaining: 8.94s
10: learn: 0.5449478 total: 186ms remaining: 8.27s
11: learn: 0.5387620 total: 190ms remaining: 7.73s
12: learn: 0.5298248 total: 194ms remaining: 7.27s
13: learn: 0.5250956 total: 198ms remaining: 6.88s
14: learn: 0.5206796 total: 202ms remaining: 6.54s
15: learn: 0.5134477 total: 206ms remaining: 6.24s
16: learn: 0.5093080 total: 210ms remaining: 5.98s
17: learn: 0.5049560 total: 214ms remaining: 5.74s
18: learn: 0.5000185 total: 219ms remaining: 5.53s
19: learn: 0.4967157 total: 223ms remaining: 5.34s
20: learn: 0.4921926 total: 227ms remaining: 5.17s
21: learn: 0.4885324 total: 231ms remaining: 5.01s
22: learn: 0.4858474 total: 235ms remaining: 4.87s
23: learn: 0.4825161 total: 239ms remaining: 4.74s
24: learn: 0.4784191 total: 243ms remaining: 4.62s
25: learn: 0.4753227 total: 247ms remaining: 4.5s
26: learn: 0.4727560 total: 251ms remaining: 4.39s
27: learn: 0.4701519 total: 255ms remaining: 4.29s
28: learn: 0.4672838 total: 259ms remaining: 4.21s
29: learn: 0.4650169 total: 263ms remaining: 4.12s
30: learn: 0.4626239 total: 267ms remaining: 4.04s
31: learn: 0.4600279 total: 271ms remaining: 3.97s
32: learn: 0.4581928 total: 275ms remaining: 3.9s
33: learn: 0.4558674 total: 279ms remaining: 3.83s
34: learn: 0.4532745 total: 284ms remaining: 3.77s
35: learn: 0.4508607 total: 288ms remaining: 3.71s
36: learn: 0.4486973 total: 292ms remaining: 3.65s
37: learn: 0.4468958 total: 296ms remaining: 3.6s
38: learn: 0.4453799 total: 300ms remaining: 3.54s
39: learn: 0.4440906 total: 304ms remaining: 3.49s
40: learn: 0.4428139 total: 308ms remaining: 3.45s
41: learn: 0.4411980 total: 312ms remaining: 3.4s
42: learn: 0.4397635 total: 316ms remaining: 3.36s
43: learn: 0.4382574 total: 320ms remaining: 3.32s
44: learn: 0.4362663 total: 325ms remaining: 3.28s
45: learn: 0.4352669 total: 329ms remaining: 3.25s
46: learn: 0.4338928 total: 334ms remaining: 3.21s
47: learn: 0.4331745 total: 338ms remaining: 3.18s
48: learn: 0.4318094 total: 342ms remaining: 3.15s
49: learn: 0.4308120 total: 346ms remaining: 3.12s
50: learn: 0.4294596 total: 351ms remaining: 3.09s
51: learn: 0.4288123 total: 355ms remaining: 3.06s
52: learn: 0.4279420 total: 360ms remaining: 3.03s
53: learn: 0.4269846 total: 364ms remaining: 3s
54: learn: 0.4261661 total: 368ms remaining: 2.98s
55: learn: 0.4246760 total: 373ms remaining: 2.95s
56: learn: 0.4234944 total: 377ms remaining: 2.93s
57: learn: 0.4222398 total: 382ms remaining: 2.91s
58: learn: 0.4211157 total: 387ms remaining: 2.89s
59: learn: 0.4206419 total: 391ms remaining: 2.87s
60: learn: 0.4201702 total: 396ms remaining: 2.85s
61: learn: 0.4196927 total: 401ms remaining: 2.83s
62: learn: 0.4191732 total: 405ms remaining: 2.81s
63: learn: 0.4182540 total: 410ms remaining: 2.79s
64: learn: 0.4171922 total: 415ms remaining: 2.77s
65: learn: 0.4161270 total: 419ms remaining: 2.75s
66: learn: 0.4156055 total: 424ms remaining: 2.74s
67: learn: 0.4148372 total: 428ms remaining: 2.72s
68: learn: 0.4140445 total: 433ms remaining: 2.7s
69: learn: 0.4131414 total: 438ms remaining: 2.69s
70: learn: 0.4124092 total: 442ms remaining: 2.67s
71: learn: 0.4118613 total: 447ms remaining: 2.65s
72: learn: 0.4109336 total: 451ms remaining: 2.64s
73: learn: 0.4099184 total: 455ms remaining: 2.62s
74: learn: 0.4090886 total: 459ms remaining: 2.6s
75: learn: 0.4086462 total: 464ms remaining: 2.59s
76: learn: 0.4076658 total: 468ms remaining: 2.57s
77: learn: 0.4071392 total: 472ms remaining: 2.56s
78: learn: 0.4060856 total: 477ms remaining: 2.54s
79: learn: 0.4054211 total: 481ms remaining: 2.52s
80: learn: 0.4047032 total: 485ms remaining: 2.51s
81: learn: 0.4042291 total: 489ms remaining: 2.49s
82: learn: 0.4038277 total: 493ms remaining: 2.48s
83: learn: 0.4034973 total: 497ms remaining: 2.46s
84: learn: 0.4028241 total: 501ms remaining: 2.45s
85: learn: 0.4024115 total: 505ms remaining: 2.43s
86: learn: 0.4013696 total: 510ms remaining: 2.42s
87: learn: 0.4008991 total: 514ms remaining: 2.4s
88: learn: 0.4004595 total: 518ms remaining: 2.39s
89: learn: 0.4001107 total: 522ms remaining: 2.38s
90: learn: 0.3996613 total: 526ms remaining: 2.37s
91: learn: 0.3991905 total: 531ms remaining: 2.35s
92: learn: 0.3984470 total: 534ms remaining: 2.34s
93: learn: 0.3977621 total: 539ms remaining: 2.33s
94: learn: 0.3975819 total: 543ms remaining: 2.32s
95: learn: 0.3972035 total: 548ms remaining: 2.31s
96: learn: 0.3967400 total: 552ms remaining: 2.29s
97: learn: 0.3962854 total: 557ms remaining: 2.28s
98: learn: 0.3959428 total: 561ms remaining: 2.27s
99: learn: 0.3955785 total: 566ms remaining: 2.26s
100: learn: 0.3953569 total: 570ms remaining: 2.25s
101: learn: 0.3947103 total: 575ms remaining: 2.24s
102: learn: 0.3938517 total: 580ms remaining: 2.23s
103: learn: 0.3935609 total: 584ms remaining: 2.22s
104: learn: 0.3928339 total: 588ms remaining: 2.21s
105: learn: 0.3922325 total: 592ms remaining: 2.2s
106: learn: 0.3918105 total: 596ms remaining: 2.19s
107: learn: 0.3913443 total: 601ms remaining: 2.18s
108: learn: 0.3909939 total: 605ms remaining: 2.17s
109: learn: 0.3906304 total: 609ms remaining: 2.16s
110: learn: 0.3901867 total: 613ms remaining: 2.15s
111: learn: 0.3899108 total: 617ms remaining: 2.14s
112: learn: 0.3896551 total: 621ms remaining: 2.13s
113: learn: 0.3890135 total: 625ms remaining: 2.12s
114: learn: 0.3886448 total: 630ms remaining: 2.11s
115: learn: 0.3883451 total: 634ms remaining: 2.1s
116: learn: 0.3880929 total: 638ms remaining: 2.09s
117: learn: 0.3878145 total: 643ms remaining: 2.08s
118: learn: 0.3873152 total: 647ms remaining: 2.07s
119: learn: 0.3870133 total: 651ms remaining: 2.06s
120: learn: 0.3866560 total: 655ms remaining: 2.05s
121: learn: 0.3862752 total: 659ms remaining: 2.04s
122: learn: 0.3855381 total: 663ms remaining: 2.03s
123: learn: 0.3852125 total: 668ms remaining: 2.02s
124: learn: 0.3848299 total: 672ms remaining: 2.02s
125: learn: 0.3844479 total: 677ms remaining: 2.01s
126: learn: 0.3840590 total: 681ms remaining: 2s
127: learn: 0.3837507 total: 685ms remaining: 1.99s
128: learn: 0.3833959 total: 689ms remaining: 1.98s
129: learn: 0.3830532 total: 693ms remaining: 1.97s
130: learn: 0.3827598 total: 697ms remaining: 1.96s
131: learn: 0.3821048 total: 701ms remaining: 1.96s
132: learn: 0.3818578 total: 705ms remaining: 1.95s
133: learn: 0.3815380 total: 709ms remaining: 1.94s
134: learn: 0.3813210 total: 713ms remaining: 1.93s
135: learn: 0.3810244 total: 718ms remaining: 1.92s
136: learn: 0.3808358 total: 722ms remaining: 1.91s
137: learn: 0.3806206 total: 727ms remaining: 1.91s
138: learn: 0.3803516 total: 731ms remaining: 1.9s
139: learn: 0.3801208 total: 735ms remaining: 1.89s
140: learn: 0.3797165 total: 739ms remaining: 1.88s
141: learn: 0.3790137 total: 743ms remaining: 1.87s
142: learn: 0.3786261 total: 748ms remaining: 1.87s
143: learn: 0.3783452 total: 752ms remaining: 1.86s
144: learn: 0.3780608 total: 756ms remaining: 1.85s
145: learn: 0.3777550 total: 761ms remaining: 1.84s
146: learn: 0.3774761 total: 765ms remaining: 1.84s
147: learn: 0.3769238 total: 769ms remaining: 1.83s
148: learn: 0.3767423 total: 773ms remaining: 1.82s
149: learn: 0.3763279 total: 777ms remaining: 1.81s
150: learn: 0.3760566 total: 781ms remaining: 1.81s
151: learn: 0.3757014 total: 786ms remaining: 1.8s
152: learn: 0.3753736 total: 790ms remaining: 1.79s
153: learn: 0.3750068 total: 794ms remaining: 1.78s
154: learn: 0.3746859 total: 798ms remaining: 1.77s
155: learn: 0.3743231 total: 802ms remaining: 1.77s
156: learn: 0.3740556 total: 806ms remaining: 1.76s
157: learn: 0.3737772 total: 810ms remaining: 1.75s
158: learn: 0.3735753 total: 814ms remaining: 1.75s
159: learn: 0.3734153 total: 819ms remaining: 1.74s
160: learn: 0.3728946 total: 823ms remaining: 1.73s
161: learn: 0.3724313 total: 827ms remaining: 1.72s
162: learn: 0.3719776 total: 831ms remaining: 1.72s
163: learn: 0.3717913 total: 835ms remaining: 1.71s
164: learn: 0.3715023 total: 839ms remaining: 1.7s
165: learn: 0.3712874 total: 843ms remaining: 1.7s
166: learn: 0.3709246 total: 847ms remaining: 1.69s
167: learn: 0.3705663 total: 852ms remaining: 1.68s
168: learn: 0.3703596 total: 856ms remaining: 1.68s
169: learn: 0.3701783 total: 860ms remaining: 1.67s
170: learn: 0.3698647 total: 864ms remaining: 1.66s
171: learn: 0.3689951 total: 868ms remaining: 1.65s
172: learn: 0.3687132 total: 872ms remaining: 1.65s
173: learn: 0.3684437 total: 876ms remaining: 1.64s
174: learn: 0.3680575 total: 880ms remaining: 1.63s
175: learn: 0.3677380 total: 884ms remaining: 1.63s
176: learn: 0.3675157 total: 888ms remaining: 1.62s
177: learn: 0.3672580 total: 892ms remaining: 1.61s
178: learn: 0.3667561 total: 896ms remaining: 1.61s
179: learn: 0.3665125 total: 900ms remaining: 1.6s
180: learn: 0.3662650 total: 904ms remaining: 1.59s
181: learn: 0.3658646 total: 909ms remaining: 1.59s
182: learn: 0.3654361 total: 913ms remaining: 1.58s
183: learn: 0.3651905 total: 917ms remaining: 1.57s
184: learn: 0.3647995 total: 922ms remaining: 1.57s
185: learn: 0.3644163 total: 926ms remaining: 1.56s
186: learn: 0.3641485 total: 930ms remaining: 1.56s
187: learn: 0.3639176 total: 934ms remaining: 1.55s
188: learn: 0.3636846 total: 939ms remaining: 1.54s
189: learn: 0.3632983 total: 943ms remaining: 1.54s
190: learn: 0.3630024 total: 947ms remaining: 1.53s
191: learn: 0.3628494 total: 951ms remaining: 1.53s
192: learn: 0.3624410 total: 956ms remaining: 1.52s
193: learn: 0.3620925 total: 960ms remaining: 1.51s
194: learn: 0.3615618 total: 964ms remaining: 1.51s
195: learn: 0.3613007 total: 968ms remaining: 1.5s
196: learn: 0.3610058 total: 972ms remaining: 1.49s
197: learn: 0.3606136 total: 976ms remaining: 1.49s
198: learn: 0.3603982 total: 980ms remaining: 1.48s
199: learn: 0.3601614 total: 984ms remaining: 1.48s
200: learn: 0.3598831 total: 988ms remaining: 1.47s
201: learn: 0.3594937 total: 992ms remaining: 1.46s
202: learn: 0.3591371 total: 996ms remaining: 1.46s
203: learn: 0.3587662 total: 1s remaining: 1.45s
204: learn: 0.3585065 total: 1s remaining: 1.45s
205: learn: 0.3582238 total: 1.01s remaining: 1.44s
206: learn: 0.3579861 total: 1.01s remaining: 1.43s
207: learn: 0.3575965 total: 1.02s remaining: 1.43s
208: learn: 0.3573726 total: 1.02s remaining: 1.42s
209: learn: 0.3571523 total: 1.02s remaining: 1.42s
210: learn: 0.3567146 total: 1.03s remaining: 1.41s
211: learn: 0.3564900 total: 1.03s remaining: 1.4s
212: learn: 0.3562025 total: 1.04s remaining: 1.4s
213: learn: 0.3560374 total: 1.04s remaining: 1.39s
214: learn: 0.3558365 total: 1.04s remaining: 1.39s
215: learn: 0.3554548 total: 1.05s remaining: 1.38s
216: learn: 0.3551950 total: 1.05s remaining: 1.37s
217: learn: 0.3548987 total: 1.06s remaining: 1.37s
218: learn: 0.3546235 total: 1.06s remaining: 1.36s
219: learn: 0.3544548 total: 1.06s remaining: 1.35s
220: learn: 0.3539808 total: 1.07s remaining: 1.35s
221: learn: 0.3536188 total: 1.07s remaining: 1.34s
222: learn: 0.3534571 total: 1.08s remaining: 1.34s
223: learn: 0.3530476 total: 1.08s remaining: 1.33s
224: learn: 0.3528072 total: 1.08s remaining: 1.33s
225: learn: 0.3526505 total: 1.09s remaining: 1.32s
226: learn: 0.3521764 total: 1.09s remaining: 1.31s
227: learn: 0.3518450 total: 1.1s remaining: 1.31s
228: learn: 0.3515647 total: 1.1s remaining: 1.3s
229: learn: 0.3513295 total: 1.11s remaining: 1.3s
230: learn: 0.3511015 total: 1.11s remaining: 1.29s
231: learn: 0.3508733 total: 1.12s remaining: 1.29s
232: learn: 0.3505986 total: 1.12s remaining: 1.28s
233: learn: 0.3503810 total: 1.12s remaining: 1.28s
234: learn: 0.3499257 total: 1.13s remaining: 1.27s
235: learn: 0.3495168 total: 1.13s remaining: 1.27s
236: learn: 0.3493283 total: 1.14s remaining: 1.26s
237: learn: 0.3490038 total: 1.14s remaining: 1.26s
238: learn: 0.3487530 total: 1.15s remaining: 1.25s
239: learn: 0.3483813 total: 1.15s remaining: 1.25s
240: learn: 0.3482022 total: 1.15s remaining: 1.24s
241: learn: 0.3477909 total: 1.16s remaining: 1.24s
242: learn: 0.3475755 total: 1.16s remaining: 1.23s
243: learn: 0.3472890 total: 1.17s remaining: 1.22s
244: learn: 0.3469058 total: 1.17s remaining: 1.22s
245: learn: 0.3466587 total: 1.18s remaining: 1.21s
246: learn: 0.3463363 total: 1.18s remaining: 1.21s
247: learn: 0.3460121 total: 1.18s remaining: 1.2s
248: learn: 0.3456395 total: 1.19s remaining: 1.2s
249: learn: 0.3453782 total: 1.19s remaining: 1.19s
250: learn: 0.3451867 total: 1.2s remaining: 1.19s
251: learn: 0.3450126 total: 1.2s remaining: 1.18s
252: learn: 0.3447237 total: 1.2s remaining: 1.18s
253: learn: 0.3443757 total: 1.21s remaining: 1.17s
254: learn: 0.3440171 total: 1.21s remaining: 1.17s
255: learn: 0.3438118 total: 1.22s remaining: 1.16s
256: learn: 0.3434448 total: 1.22s remaining: 1.15s
257: learn: 0.3430577 total: 1.23s remaining: 1.15s
258: learn: 0.3428079 total: 1.23s remaining: 1.14s
259: learn: 0.3425729 total: 1.23s remaining: 1.14s
260: learn: 0.3422896 total: 1.24s remaining: 1.13s
261: learn: 0.3420298 total: 1.24s remaining: 1.13s
262: learn: 0.3418588 total: 1.25s remaining: 1.12s
263: learn: 0.3415594 total: 1.25s remaining: 1.12s
264: learn: 0.3413364 total: 1.25s remaining: 1.11s
265: learn: 0.3410729 total: 1.26s remaining: 1.11s
266: learn: 0.3407258 total: 1.26s remaining: 1.1s
267: learn: 0.3403525 total: 1.27s remaining: 1.1s
268: learn: 0.3400240 total: 1.27s remaining: 1.09s
269: learn: 0.3397836 total: 1.27s remaining: 1.09s
270: learn: 0.3395970 total: 1.28s remaining: 1.08s
271: learn: 0.3393330 total: 1.28s remaining: 1.08s
272: learn: 0.3390779 total: 1.29s remaining: 1.07s
273: learn: 0.3388413 total: 1.29s remaining: 1.07s
274: learn: 0.3386209 total: 1.3s remaining: 1.06s
275: learn: 0.3384843 total: 1.3s remaining: 1.06s
276: learn: 0.3383269 total: 1.3s remaining: 1.05s
277: learn: 0.3380997 total: 1.31s remaining: 1.04s
278: learn: 0.3376140 total: 1.31s remaining: 1.04s
279: learn: 0.3373889 total: 1.32s remaining: 1.03s
280: learn: 0.3371364 total: 1.32s remaining: 1.03s
281: learn: 0.3369327 total: 1.33s remaining: 1.02s
282: learn: 0.3366869 total: 1.33s remaining: 1.02s
283: learn: 0.3364938 total: 1.33s remaining: 1.01s
284: learn: 0.3361430 total: 1.34s remaining: 1.01s
285: learn: 0.3357590 total: 1.34s remaining: 1s
286: learn: 0.3354642 total: 1.35s remaining: 1000ms
287: learn: 0.3351516 total: 1.35s remaining: 995ms
288: learn: 0.3349655 total: 1.35s remaining: 989ms
289: learn: 0.3346687 total: 1.36s remaining: 984ms
290: learn: 0.3343748 total: 1.36s remaining: 979ms
291: learn: 0.3342099 total: 1.37s remaining: 974ms
292: learn: 0.3339811 total: 1.37s remaining: 969ms
293: learn: 0.3338068 total: 1.38s remaining: 964ms
294: learn: 0.3335994 total: 1.38s remaining: 958ms
295: learn: 0.3334345 total: 1.38s remaining: 953ms
296: learn: 0.3332029 total: 1.39s remaining: 948ms
297: learn: 0.3329870 total: 1.39s remaining: 943ms
298: learn: 0.3327762 total: 1.4s remaining: 938ms
299: learn: 0.3325142 total: 1.4s remaining: 933ms
300: learn: 0.3323433 total: 1.4s remaining: 928ms
301: learn: 0.3320582 total: 1.41s remaining: 923ms
302: learn: 0.3317948 total: 1.41s remaining: 918ms
303: learn: 0.3316476 total: 1.42s remaining: 913ms
304: learn: 0.3315116 total: 1.42s remaining: 908ms
305: learn: 0.3313714 total: 1.42s remaining: 903ms
306: learn: 0.3310924 total: 1.43s remaining: 898ms
307: learn: 0.3307139 total: 1.43s remaining: 893ms
308: learn: 0.3304509 total: 1.44s remaining: 888ms
309: learn: 0.3301529 total: 1.44s remaining: 883ms
310: learn: 0.3299043 total: 1.44s remaining: 878ms
311: learn: 0.3296830 total: 1.45s remaining: 873ms
312: learn: 0.3295532 total: 1.45s remaining: 868ms
313: learn: 0.3293181 total: 1.46s remaining: 863ms
314: learn: 0.3291434 total: 1.46s remaining: 858ms
315: learn: 0.3290008 total: 1.46s remaining: 853ms
316: learn: 0.3287641 total: 1.47s remaining: 848ms
317: learn: 0.3284565 total: 1.47s remaining: 843ms
318: learn: 0.3282028 total: 1.48s remaining: 839ms
319: learn: 0.3279633 total: 1.48s remaining: 834ms
320: learn: 0.3277893 total: 1.49s remaining: 829ms
321: learn: 0.3274594 total: 1.49s remaining: 824ms
322: learn: 0.3271429 total: 1.5s remaining: 820ms
323: learn: 0.3268233 total: 1.5s remaining: 815ms
324: learn: 0.3265536 total: 1.5s remaining: 810ms
325: learn: 0.3263620 total: 1.51s remaining: 805ms
326: learn: 0.3261476 total: 1.51s remaining: 800ms
327: learn: 0.3258907 total: 1.52s remaining: 795ms
328: learn: 0.3256347 total: 1.52s remaining: 790ms
329: learn: 0.3255209 total: 1.52s remaining: 785ms
330: learn: 0.3252162 total: 1.53s remaining: 780ms
331: learn: 0.3249211 total: 1.53s remaining: 776ms
332: learn: 0.3248113 total: 1.54s remaining: 771ms
333: learn: 0.3245877 total: 1.54s remaining: 766ms
334: learn: 0.3244265 total: 1.54s remaining: 761ms
335: learn: 0.3242222 total: 1.55s remaining: 756ms
336: learn: 0.3239710 total: 1.55s remaining: 751ms
337: learn: 0.3236796 total: 1.56s remaining: 746ms
338: learn: 0.3235540 total: 1.56s remaining: 741ms
339: learn: 0.3233247 total: 1.56s remaining: 736ms
340: learn: 0.3231514 total: 1.57s remaining: 732ms
341: learn: 0.3229921 total: 1.57s remaining: 727ms
342: learn: 0.3228261 total: 1.58s remaining: 722ms
343: learn: 0.3225548 total: 1.58s remaining: 717ms
344: learn: 0.3223300 total: 1.58s remaining: 712ms
345: learn: 0.3221700 total: 1.59s remaining: 707ms
346: learn: 0.3218305 total: 1.59s remaining: 703ms
347: learn: 0.3216517 total: 1.6s remaining: 698ms
348: learn: 0.3214773 total: 1.6s remaining: 693ms
349: learn: 0.3212061 total: 1.6s remaining: 688ms
350: learn: 0.3209463 total: 1.61s remaining: 683ms
351: learn: 0.3207009 total: 1.61s remaining: 678ms
352: learn: 0.3204353 total: 1.62s remaining: 674ms
353: learn: 0.3201435 total: 1.62s remaining: 669ms
354: learn: 0.3199788 total: 1.63s remaining: 664ms
355: learn: 0.3197831 total: 1.63s remaining: 659ms
356: learn: 0.3196475 total: 1.63s remaining: 655ms
357: learn: 0.3195041 total: 1.64s remaining: 650ms
358: learn: 0.3193346 total: 1.64s remaining: 645ms
359: learn: 0.3190688 total: 1.65s remaining: 640ms
360: learn: 0.3188957 total: 1.65s remaining: 636ms
361: learn: 0.3186951 total: 1.66s remaining: 631ms
362: learn: 0.3185346 total: 1.66s remaining: 626ms
363: learn: 0.3183291 total: 1.66s remaining: 622ms
364: learn: 0.3181640 total: 1.67s remaining: 617ms
365: learn: 0.3179669 total: 1.67s remaining: 612ms
366: learn: 0.3176881 total: 1.68s remaining: 608ms
367: learn: 0.3174850 total: 1.68s remaining: 603ms
368: learn: 0.3173196 total: 1.68s remaining: 598ms
369: learn: 0.3171836 total: 1.69s remaining: 593ms
370: learn: 0.3169669 total: 1.69s remaining: 589ms
371: learn: 0.3167441 total: 1.7s remaining: 584ms
372: learn: 0.3165412 total: 1.7s remaining: 580ms
373: learn: 0.3162564 total: 1.71s remaining: 575ms
374: learn: 0.3160403 total: 1.71s remaining: 570ms
375: learn: 0.3158625 total: 1.72s remaining: 566ms
376: learn: 0.3156029 total: 1.72s remaining: 561ms
377: learn: 0.3154028 total: 1.72s remaining: 556ms
378: learn: 0.3152610 total: 1.73s remaining: 551ms
379: learn: 0.3150768 total: 1.73s remaining: 547ms
380: learn: 0.3149100 total: 1.74s remaining: 542ms
381: learn: 0.3147168 total: 1.74s remaining: 537ms
382: learn: 0.3144597 total: 1.74s remaining: 533ms
383: learn: 0.3141522 total: 1.75s remaining: 528ms
384: learn: 0.3139195 total: 1.75s remaining: 523ms
385: learn: 0.3137682 total: 1.75s remaining: 519ms
386: learn: 0.3135814 total: 1.76s remaining: 514ms
387: learn: 0.3134311 total: 1.76s remaining: 509ms
388: learn: 0.3131777 total: 1.77s remaining: 504ms
389: learn: 0.3129681 total: 1.77s remaining: 500ms
390: learn: 0.3128155 total: 1.77s remaining: 495ms
391: learn: 0.3126621 total: 1.78s remaining: 490ms
392: learn: 0.3124620 total: 1.78s remaining: 486ms
393: learn: 0.3122817 total: 1.79s remaining: 481ms
394: learn: 0.3119730 total: 1.79s remaining: 476ms
395: learn: 0.3118019 total: 1.8s remaining: 472ms
396: learn: 0.3116760 total: 1.8s remaining: 467ms
397: learn: 0.3114665 total: 1.8s remaining: 462ms
398: learn: 0.3112598 total: 1.81s remaining: 458ms
399: learn: 0.3109805 total: 1.81s remaining: 453ms
400: learn: 0.3108199 total: 1.82s remaining: 449ms
401: learn: 0.3106389 total: 1.82s remaining: 444ms
402: learn: 0.3103400 total: 1.83s remaining: 440ms
403: learn: 0.3101822 total: 1.83s remaining: 435ms
404: learn: 0.3100133 total: 1.83s remaining: 430ms
405: learn: 0.3098858 total: 1.84s remaining: 426ms
406: learn: 0.3098053 total: 1.84s remaining: 421ms
407: learn: 0.3096342 total: 1.85s remaining: 417ms
408: learn: 0.3094406 total: 1.85s remaining: 412ms
409: learn: 0.3092680 total: 1.85s remaining: 407ms
410: learn: 0.3090495 total: 1.86s remaining: 403ms
411: learn: 0.3088651 total: 1.86s remaining: 398ms
412: learn: 0.3086578 total: 1.87s remaining: 393ms
413: learn: 0.3085374 total: 1.87s remaining: 389ms
414: learn: 0.3082606 total: 1.88s remaining: 384ms
415: learn: 0.3080259 total: 1.88s remaining: 380ms
416: learn: 0.3078063 total: 1.88s remaining: 375ms
417: learn: 0.3076169 total: 1.89s remaining: 370ms
418: learn: 0.3074439 total: 1.89s remaining: 366ms
419: learn: 0.3073432 total: 1.9s remaining: 361ms
420: learn: 0.3071002 total: 1.9s remaining: 357ms
421: learn: 0.3069547 total: 1.9s remaining: 352ms
422: learn: 0.3068052 total: 1.91s remaining: 347ms
423: learn: 0.3066443 total: 1.91s remaining: 343ms
424: learn: 0.3064255 total: 1.92s remaining: 338ms
425: learn: 0.3062297 total: 1.92s remaining: 334ms
426: learn: 0.3060150 total: 1.92s remaining: 329ms
427: learn: 0.3059343 total: 1.93s remaining: 324ms
428: learn: 0.3057685 total: 1.93s remaining: 320ms
429: learn: 0.3054583 total: 1.94s remaining: 315ms
430: learn: 0.3052342 total: 1.94s remaining: 311ms
431: learn: 0.3050329 total: 1.94s remaining: 306ms
432: learn: 0.3047790 total: 1.95s remaining: 302ms
433: learn: 0.3045401 total: 1.95s remaining: 297ms
434: learn: 0.3044138 total: 1.96s remaining: 292ms
435: learn: 0.3041903 total: 1.96s remaining: 288ms
436: learn: 0.3040483 total: 1.97s remaining: 283ms
437: learn: 0.3038387 total: 1.97s remaining: 279ms
438: learn: 0.3036834 total: 1.97s remaining: 274ms
439: learn: 0.3035708 total: 1.98s remaining: 270ms
440: learn: 0.3033995 total: 1.98s remaining: 265ms
441: learn: 0.3032521 total: 1.99s remaining: 261ms
442: learn: 0.3031190 total: 1.99s remaining: 256ms
443: learn: 0.3029623 total: 1.99s remaining: 251ms
444: learn: 0.3027578 total: 2s remaining: 247ms
445: learn: 0.3026579 total: 2s remaining: 242ms
446: learn: 0.3024539 total: 2.01s remaining: 238ms
447: learn: 0.3023502 total: 2.01s remaining: 233ms
448: learn: 0.3021773 total: 2.02s remaining: 229ms
449: learn: 0.3019574 total: 2.02s remaining: 224ms
450: learn: 0.3017977 total: 2.02s remaining: 220ms
451: learn: 0.3016176 total: 2.03s remaining: 215ms
452: learn: 0.3014791 total: 2.03s remaining: 211ms
453: learn: 0.3013126 total: 2.04s remaining: 206ms
454: learn: 0.3011083 total: 2.04s remaining: 202ms
455: learn: 0.3009095 total: 2.04s remaining: 197ms
456: learn: 0.3007403 total: 2.05s remaining: 193ms
457: learn: 0.3006040 total: 2.05s remaining: 188ms
458: learn: 0.3002811 total: 2.06s remaining: 184ms
459: learn: 0.2999927 total: 2.06s remaining: 179ms
460: learn: 0.2998643 total: 2.06s remaining: 175ms
461: learn: 0.2997269 total: 2.07s remaining: 170ms
462: learn: 0.2995341 total: 2.07s remaining: 166ms
463: learn: 0.2993643 total: 2.08s remaining: 161ms
464: learn: 0.2992783 total: 2.08s remaining: 157ms
465: learn: 0.2991613 total: 2.09s remaining: 152ms
466: learn: 0.2989938 total: 2.09s remaining: 148ms
467: learn: 0.2987307 total: 2.09s remaining: 143ms
468: learn: 0.2986396 total: 2.1s remaining: 139ms
469: learn: 0.2984617 total: 2.1s remaining: 134ms
470: learn: 0.2983607 total: 2.11s remaining: 130ms
471: learn: 0.2982174 total: 2.11s remaining: 125ms
472: learn: 0.2980570 total: 2.11s remaining: 121ms
473: learn: 0.2979115 total: 2.12s remaining: 116ms
474: learn: 0.2977440 total: 2.12s remaining: 112ms
475: learn: 0.2975670 total: 2.13s remaining: 107ms
476: learn: 0.2974509 total: 2.13s remaining: 103ms
477: learn: 0.2973277 total: 2.13s remaining: 98.2ms
478: learn: 0.2972237 total: 2.14s remaining: 93.8ms
479: learn: 0.2970308 total: 2.14s remaining: 89.3ms
480: learn: 0.2968913 total: 2.15s remaining: 84.8ms
481: learn: 0.2967294 total: 2.15s remaining: 80.3ms
482: learn: 0.2965558 total: 2.15s remaining: 75.9ms
483: learn: 0.2964349 total: 2.16s remaining: 71.4ms
484: learn: 0.2962704 total: 2.16s remaining: 66.9ms
485: learn: 0.2960857 total: 2.17s remaining: 62.5ms
486: learn: 0.2959286 total: 2.17s remaining: 58ms
487: learn: 0.2957958 total: 2.18s remaining: 53.5ms
488: learn: 0.2956658 total: 2.18s remaining: 49.1ms
489: learn: 0.2954544 total: 2.19s remaining: 44.6ms
490: learn: 0.2953442 total: 2.19s remaining: 40.1ms
491: learn: 0.2952219 total: 2.19s remaining: 35.7ms
492: learn: 0.2950469 total: 2.2s remaining: 31.2ms
493: learn: 0.2949626 total: 2.2s remaining: 26.8ms
494: learn: 0.2948345 total: 2.21s remaining: 22.3ms
495: learn: 0.2946874 total: 2.21s remaining: 17.8ms
496: learn: 0.2944452 total: 2.21s remaining: 13.4ms
497: learn: 0.2942655 total: 2.22s remaining: 8.92ms
498: learn: 0.2941078 total: 2.22s remaining: 4.46ms
499: learn: 0.2939681 total: 2.23s remaining: 0us
{'learning_rate': 0.05, 'max_depth': 5, 'n_estimators': 500}
best_score = 0.8124825057279841
cbc_mod = CatBoostClassifier(n_estimators=500, learning_rate=0.05, max_depth=5)
cbc_mod.fit(X_train, y_train)
print('Acc Train =', round(cbc_mod.score(X_train, y_train) * 100, 2))
print('Acc Test =', round(cbc_mod.score(X_test, y_test) * 100, 2))
model_acc['CatBoostClassifier'] = (round(cbc_mod.score(X_test, y_test) * 100, 2), cbc_mod)
0: learn: 0.6715886 total: 4.44ms remaining: 2.22s 1: learn: 0.6497685 total: 8.64ms remaining: 2.15s 2: learn: 0.6313424 total: 13.1ms remaining: 2.16s 3: learn: 0.6165955 total: 17.3ms remaining: 2.15s 4: learn: 0.6047565 total: 21.8ms remaining: 2.16s 5: learn: 0.5901331 total: 25.8ms remaining: 2.13s 6: learn: 0.5783931 total: 29.9ms remaining: 2.11s 7: learn: 0.5692709 total: 33.6ms remaining: 2.07s 8: learn: 0.5601853 total: 38ms remaining: 2.07s 9: learn: 0.5516509 total: 42.3ms remaining: 2.07s 10: learn: 0.5449478 total: 46.1ms remaining: 2.05s 11: learn: 0.5387620 total: 50.4ms remaining: 2.05s 12: learn: 0.5298248 total: 54.7ms remaining: 2.05s 13: learn: 0.5250956 total: 58.9ms remaining: 2.04s 14: learn: 0.5206796 total: 63.5ms remaining: 2.05s 15: learn: 0.5134477 total: 67.9ms remaining: 2.05s 16: learn: 0.5093080 total: 72.2ms remaining: 2.05s 17: learn: 0.5049560 total: 76.6ms remaining: 2.05s 18: learn: 0.5000185 total: 80.9ms remaining: 2.05s 19: learn: 0.4967157 total: 85.1ms remaining: 2.04s 20: learn: 0.4921926 total: 89.4ms remaining: 2.04s 21: learn: 0.4885324 total: 93.6ms remaining: 2.03s 22: learn: 0.4858474 total: 98.5ms remaining: 2.04s 23: learn: 0.4825161 total: 103ms remaining: 2.05s 24: learn: 0.4784191 total: 108ms remaining: 2.05s 25: learn: 0.4753227 total: 112ms remaining: 2.05s 26: learn: 0.4727560 total: 117ms remaining: 2.05s 27: learn: 0.4701519 total: 122ms remaining: 2.05s 28: learn: 0.4672838 total: 126ms remaining: 2.05s 29: learn: 0.4650169 total: 131ms remaining: 2.05s 30: learn: 0.4626239 total: 136ms remaining: 2.06s 31: learn: 0.4600279 total: 140ms remaining: 2.05s 32: learn: 0.4581928 total: 145ms remaining: 2.05s 33: learn: 0.4558674 total: 149ms remaining: 2.04s 34: learn: 0.4532745 total: 153ms remaining: 2.03s 35: learn: 0.4508607 total: 157ms remaining: 2.02s 36: learn: 0.4486973 total: 161ms remaining: 2.01s 37: learn: 0.4468958 total: 165ms remaining: 2.01s 38: learn: 0.4453799 total: 169ms remaining: 2s 39: learn: 0.4440906 total: 174ms remaining: 2s 40: learn: 0.4428139 total: 178ms remaining: 1.99s 41: learn: 0.4411980 total: 182ms remaining: 1.98s 42: learn: 0.4397635 total: 186ms remaining: 1.98s 43: learn: 0.4382574 total: 190ms remaining: 1.97s 44: learn: 0.4362663 total: 195ms remaining: 1.97s 45: learn: 0.4352669 total: 199ms remaining: 1.96s 46: learn: 0.4338928 total: 204ms remaining: 1.96s 47: learn: 0.4331745 total: 208ms remaining: 1.96s 48: learn: 0.4318094 total: 212ms remaining: 1.95s 49: learn: 0.4308120 total: 216ms remaining: 1.95s 50: learn: 0.4294596 total: 221ms remaining: 1.94s 51: learn: 0.4288123 total: 225ms remaining: 1.94s 52: learn: 0.4279420 total: 229ms remaining: 1.93s 53: learn: 0.4269846 total: 234ms remaining: 1.93s 54: learn: 0.4261661 total: 238ms remaining: 1.93s 55: learn: 0.4246760 total: 243ms remaining: 1.93s 56: learn: 0.4234944 total: 248ms remaining: 1.93s 57: learn: 0.4222398 total: 253ms remaining: 1.93s 58: learn: 0.4211157 total: 257ms remaining: 1.92s 59: learn: 0.4206419 total: 262ms remaining: 1.92s 60: learn: 0.4201702 total: 266ms remaining: 1.91s 61: learn: 0.4196927 total: 271ms remaining: 1.91s 62: learn: 0.4191732 total: 275ms remaining: 1.91s 63: learn: 0.4182540 total: 280ms remaining: 1.9s 64: learn: 0.4171922 total: 284ms remaining: 1.9s 65: learn: 0.4161270 total: 289ms remaining: 1.9s 66: learn: 0.4156055 total: 293ms remaining: 1.89s 67: learn: 0.4148372 total: 297ms remaining: 1.89s 68: learn: 0.4140445 total: 303ms remaining: 1.89s 69: learn: 0.4131414 total: 308ms remaining: 1.89s 70: learn: 0.4124092 total: 312ms remaining: 1.89s 71: learn: 0.4118613 total: 317ms remaining: 1.89s 72: learn: 0.4109336 total: 321ms remaining: 1.88s 73: learn: 0.4099184 total: 326ms remaining: 1.88s 74: learn: 0.4090886 total: 330ms remaining: 1.87s 75: learn: 0.4086462 total: 334ms remaining: 1.86s 76: learn: 0.4076658 total: 339ms remaining: 1.86s 77: learn: 0.4071392 total: 343ms remaining: 1.86s 78: learn: 0.4060856 total: 347ms remaining: 1.85s 79: learn: 0.4054211 total: 352ms remaining: 1.85s 80: learn: 0.4047032 total: 356ms remaining: 1.84s 81: learn: 0.4042291 total: 361ms remaining: 1.84s 82: learn: 0.4038277 total: 365ms remaining: 1.83s 83: learn: 0.4034973 total: 369ms remaining: 1.83s 84: learn: 0.4028241 total: 374ms remaining: 1.82s 85: learn: 0.4024115 total: 378ms remaining: 1.82s 86: learn: 0.4013696 total: 383ms remaining: 1.82s 87: learn: 0.4008991 total: 387ms remaining: 1.81s 88: learn: 0.4004595 total: 392ms remaining: 1.81s 89: learn: 0.4001107 total: 396ms remaining: 1.8s 90: learn: 0.3996613 total: 401ms remaining: 1.8s 91: learn: 0.3991905 total: 405ms remaining: 1.8s 92: learn: 0.3984470 total: 409ms remaining: 1.79s 93: learn: 0.3977621 total: 414ms remaining: 1.79s 94: learn: 0.3975819 total: 418ms remaining: 1.78s 95: learn: 0.3972035 total: 423ms remaining: 1.78s 96: learn: 0.3967400 total: 427ms remaining: 1.77s 97: learn: 0.3962854 total: 432ms remaining: 1.77s 98: learn: 0.3959428 total: 437ms remaining: 1.77s 99: learn: 0.3955785 total: 441ms remaining: 1.76s 100: learn: 0.3953569 total: 446ms remaining: 1.76s 101: learn: 0.3947103 total: 450ms remaining: 1.76s 102: learn: 0.3938517 total: 455ms remaining: 1.75s 103: learn: 0.3935609 total: 459ms remaining: 1.75s 104: learn: 0.3928339 total: 463ms remaining: 1.74s 105: learn: 0.3922325 total: 467ms remaining: 1.74s 106: learn: 0.3918105 total: 472ms remaining: 1.73s 107: learn: 0.3913443 total: 476ms remaining: 1.73s 108: learn: 0.3909939 total: 481ms remaining: 1.72s 109: learn: 0.3906304 total: 486ms remaining: 1.72s 110: learn: 0.3901867 total: 490ms remaining: 1.72s 111: learn: 0.3899108 total: 495ms remaining: 1.71s 112: learn: 0.3896551 total: 499ms remaining: 1.71s 113: learn: 0.3890135 total: 504ms remaining: 1.71s 114: learn: 0.3886448 total: 508ms remaining: 1.7s 115: learn: 0.3883451 total: 513ms remaining: 1.7s 116: learn: 0.3880929 total: 517ms remaining: 1.69s 117: learn: 0.3878145 total: 521ms remaining: 1.69s 118: learn: 0.3873152 total: 525ms remaining: 1.68s 119: learn: 0.3870133 total: 529ms remaining: 1.68s 120: learn: 0.3866560 total: 533ms remaining: 1.67s 121: learn: 0.3862752 total: 537ms remaining: 1.67s 122: learn: 0.3855381 total: 542ms remaining: 1.66s 123: learn: 0.3852125 total: 546ms remaining: 1.66s 124: learn: 0.3848299 total: 550ms remaining: 1.65s 125: learn: 0.3844479 total: 555ms remaining: 1.65s 126: learn: 0.3840590 total: 559ms remaining: 1.64s 127: learn: 0.3837507 total: 563ms remaining: 1.64s 128: learn: 0.3833959 total: 568ms remaining: 1.63s 129: learn: 0.3830532 total: 572ms remaining: 1.63s 130: learn: 0.3827598 total: 577ms remaining: 1.62s 131: learn: 0.3821048 total: 581ms remaining: 1.62s 132: learn: 0.3818578 total: 586ms remaining: 1.61s 133: learn: 0.3815380 total: 590ms remaining: 1.61s 134: learn: 0.3813210 total: 595ms remaining: 1.61s 135: learn: 0.3810244 total: 600ms remaining: 1.6s 136: learn: 0.3808358 total: 605ms remaining: 1.6s 137: learn: 0.3806206 total: 609ms remaining: 1.6s 138: learn: 0.3803516 total: 613ms remaining: 1.59s 139: learn: 0.3801208 total: 618ms remaining: 1.59s 140: learn: 0.3797165 total: 622ms remaining: 1.58s 141: learn: 0.3790137 total: 626ms remaining: 1.58s 142: learn: 0.3786261 total: 631ms remaining: 1.57s 143: learn: 0.3783452 total: 635ms remaining: 1.57s 144: learn: 0.3780608 total: 670ms remaining: 1.64s 145: learn: 0.3777550 total: 675ms remaining: 1.64s 146: learn: 0.3774761 total: 679ms remaining: 1.63s 147: learn: 0.3769238 total: 683ms remaining: 1.63s 148: learn: 0.3767423 total: 688ms remaining: 1.62s 149: learn: 0.3763279 total: 694ms remaining: 1.62s 150: learn: 0.3760566 total: 700ms remaining: 1.62s 151: learn: 0.3757014 total: 704ms remaining: 1.61s 152: learn: 0.3753736 total: 708ms remaining: 1.61s 153: learn: 0.3750068 total: 713ms remaining: 1.6s 154: learn: 0.3746859 total: 717ms remaining: 1.59s 155: learn: 0.3743231 total: 721ms remaining: 1.59s 156: learn: 0.3740556 total: 726ms remaining: 1.58s 157: learn: 0.3737772 total: 732ms remaining: 1.58s 158: learn: 0.3735753 total: 738ms remaining: 1.58s 159: learn: 0.3734153 total: 742ms remaining: 1.58s 160: learn: 0.3728946 total: 747ms remaining: 1.57s 161: learn: 0.3724313 total: 753ms remaining: 1.57s 162: learn: 0.3719776 total: 757ms remaining: 1.56s 163: learn: 0.3717913 total: 762ms remaining: 1.56s 164: learn: 0.3715023 total: 766ms remaining: 1.55s 165: learn: 0.3712874 total: 770ms remaining: 1.55s 166: learn: 0.3709246 total: 774ms remaining: 1.54s 167: learn: 0.3705663 total: 779ms remaining: 1.54s 168: learn: 0.3703596 total: 783ms remaining: 1.53s 169: learn: 0.3701783 total: 788ms remaining: 1.53s 170: learn: 0.3698647 total: 793ms remaining: 1.52s 171: learn: 0.3689951 total: 797ms remaining: 1.52s 172: learn: 0.3687132 total: 802ms remaining: 1.51s 173: learn: 0.3684437 total: 806ms remaining: 1.51s 174: learn: 0.3680575 total: 810ms remaining: 1.5s 175: learn: 0.3677380 total: 814ms remaining: 1.5s 176: learn: 0.3675157 total: 818ms remaining: 1.49s 177: learn: 0.3672580 total: 822ms remaining: 1.49s 178: learn: 0.3667561 total: 827ms remaining: 1.48s 179: learn: 0.3665125 total: 831ms remaining: 1.48s 180: learn: 0.3662650 total: 835ms remaining: 1.47s 181: learn: 0.3658646 total: 839ms remaining: 1.47s 182: learn: 0.3654361 total: 843ms remaining: 1.46s 183: learn: 0.3651905 total: 847ms remaining: 1.45s 184: learn: 0.3647995 total: 851ms remaining: 1.45s 185: learn: 0.3644163 total: 855ms remaining: 1.44s 186: learn: 0.3641485 total: 859ms remaining: 1.44s 187: learn: 0.3639176 total: 863ms remaining: 1.43s 188: learn: 0.3636846 total: 867ms remaining: 1.43s 189: learn: 0.3632983 total: 872ms remaining: 1.42s 190: learn: 0.3630024 total: 876ms remaining: 1.42s 191: learn: 0.3628494 total: 880ms remaining: 1.41s 192: learn: 0.3624410 total: 884ms remaining: 1.41s 193: learn: 0.3620925 total: 888ms remaining: 1.4s 194: learn: 0.3615618 total: 892ms remaining: 1.39s 195: learn: 0.3613007 total: 896ms remaining: 1.39s 196: learn: 0.3610058 total: 900ms remaining: 1.38s 197: learn: 0.3606136 total: 904ms remaining: 1.38s 198: learn: 0.3603982 total: 908ms remaining: 1.37s 199: learn: 0.3601614 total: 913ms remaining: 1.37s 200: learn: 0.3598831 total: 917ms remaining: 1.36s 201: learn: 0.3594937 total: 921ms remaining: 1.36s 202: learn: 0.3591371 total: 926ms remaining: 1.35s 203: learn: 0.3587662 total: 930ms remaining: 1.35s 204: learn: 0.3585065 total: 934ms remaining: 1.34s 205: learn: 0.3582238 total: 939ms remaining: 1.34s 206: learn: 0.3579861 total: 943ms remaining: 1.33s 207: learn: 0.3575965 total: 948ms remaining: 1.33s 208: learn: 0.3573726 total: 952ms remaining: 1.32s 209: learn: 0.3571523 total: 956ms remaining: 1.32s 210: learn: 0.3567146 total: 961ms remaining: 1.31s 211: learn: 0.3564900 total: 965ms remaining: 1.31s 212: learn: 0.3562025 total: 970ms remaining: 1.31s 213: learn: 0.3560374 total: 974ms remaining: 1.3s 214: learn: 0.3558365 total: 979ms remaining: 1.3s 215: learn: 0.3554548 total: 984ms remaining: 1.29s 216: learn: 0.3551950 total: 988ms remaining: 1.29s 217: learn: 0.3548987 total: 992ms remaining: 1.28s 218: learn: 0.3546235 total: 996ms remaining: 1.28s 219: learn: 0.3544548 total: 1s remaining: 1.27s 220: learn: 0.3539808 total: 1s remaining: 1.27s 221: learn: 0.3536188 total: 1.01s remaining: 1.26s 222: learn: 0.3534571 total: 1.01s remaining: 1.26s 223: learn: 0.3530476 total: 1.02s remaining: 1.25s 224: learn: 0.3528072 total: 1.02s remaining: 1.25s 225: learn: 0.3526505 total: 1.02s remaining: 1.24s 226: learn: 0.3521764 total: 1.03s remaining: 1.24s 227: learn: 0.3518450 total: 1.03s remaining: 1.23s 228: learn: 0.3515647 total: 1.04s remaining: 1.23s 229: learn: 0.3513295 total: 1.04s remaining: 1.22s 230: learn: 0.3511015 total: 1.05s remaining: 1.22s 231: learn: 0.3508733 total: 1.05s remaining: 1.21s 232: learn: 0.3505986 total: 1.05s remaining: 1.21s 233: learn: 0.3503810 total: 1.06s remaining: 1.2s 234: learn: 0.3499257 total: 1.06s remaining: 1.2s 235: learn: 0.3495168 total: 1.07s remaining: 1.19s 236: learn: 0.3493283 total: 1.07s remaining: 1.19s 237: learn: 0.3490038 total: 1.07s remaining: 1.18s 238: learn: 0.3487530 total: 1.08s remaining: 1.18s 239: learn: 0.3483813 total: 1.08s remaining: 1.18s 240: learn: 0.3482022 total: 1.09s remaining: 1.17s 241: learn: 0.3477909 total: 1.09s remaining: 1.17s 242: learn: 0.3475755 total: 1.1s remaining: 1.16s 243: learn: 0.3472890 total: 1.1s remaining: 1.16s 244: learn: 0.3469058 total: 1.11s remaining: 1.15s 245: learn: 0.3466587 total: 1.11s remaining: 1.15s 246: learn: 0.3463363 total: 1.11s remaining: 1.14s 247: learn: 0.3460121 total: 1.12s remaining: 1.14s 248: learn: 0.3456395 total: 1.12s remaining: 1.13s 249: learn: 0.3453782 total: 1.13s remaining: 1.13s 250: learn: 0.3451867 total: 1.13s remaining: 1.12s 251: learn: 0.3450126 total: 1.14s remaining: 1.12s 252: learn: 0.3447237 total: 1.14s remaining: 1.11s 253: learn: 0.3443757 total: 1.15s remaining: 1.11s 254: learn: 0.3440171 total: 1.15s remaining: 1.1s 255: learn: 0.3438118 total: 1.15s remaining: 1.1s 256: learn: 0.3434448 total: 1.16s remaining: 1.09s 257: learn: 0.3430577 total: 1.16s remaining: 1.09s 258: learn: 0.3428079 total: 1.17s remaining: 1.08s 259: learn: 0.3425729 total: 1.17s remaining: 1.08s 260: learn: 0.3422896 total: 1.17s remaining: 1.07s 261: learn: 0.3420298 total: 1.18s remaining: 1.07s 262: learn: 0.3418588 total: 1.18s remaining: 1.07s 263: learn: 0.3415594 total: 1.19s remaining: 1.06s 264: learn: 0.3413364 total: 1.19s remaining: 1.06s 265: learn: 0.3410729 total: 1.2s remaining: 1.05s 266: learn: 0.3407258 total: 1.2s remaining: 1.05s 267: learn: 0.3403525 total: 1.2s remaining: 1.04s 268: learn: 0.3400240 total: 1.21s remaining: 1.04s 269: learn: 0.3397836 total: 1.21s remaining: 1.03s 270: learn: 0.3395970 total: 1.22s remaining: 1.03s 271: learn: 0.3393330 total: 1.22s remaining: 1.02s 272: learn: 0.3390779 total: 1.22s remaining: 1.02s 273: learn: 0.3388413 total: 1.23s remaining: 1.01s 274: learn: 0.3386209 total: 1.23s remaining: 1.01s 275: learn: 0.3384843 total: 1.24s remaining: 1s 276: learn: 0.3383269 total: 1.24s remaining: 999ms 277: learn: 0.3380997 total: 1.24s remaining: 994ms 278: learn: 0.3376140 total: 1.25s remaining: 989ms 279: learn: 0.3373889 total: 1.25s remaining: 985ms 280: learn: 0.3371364 total: 1.26s remaining: 980ms 281: learn: 0.3369327 total: 1.26s remaining: 975ms 282: learn: 0.3366869 total: 1.26s remaining: 970ms 283: learn: 0.3364938 total: 1.27s remaining: 966ms 284: learn: 0.3361430 total: 1.27s remaining: 961ms 285: learn: 0.3357590 total: 1.28s remaining: 957ms 286: learn: 0.3354642 total: 1.28s remaining: 953ms 287: learn: 0.3351516 total: 1.29s remaining: 948ms 288: learn: 0.3349655 total: 1.29s remaining: 944ms 289: learn: 0.3346687 total: 1.3s remaining: 940ms 290: learn: 0.3343748 total: 1.3s remaining: 935ms 291: learn: 0.3342099 total: 1.31s remaining: 931ms 292: learn: 0.3339811 total: 1.31s remaining: 926ms 293: learn: 0.3338068 total: 1.31s remaining: 921ms 294: learn: 0.3335994 total: 1.32s remaining: 917ms 295: learn: 0.3334345 total: 1.32s remaining: 912ms 296: learn: 0.3332029 total: 1.33s remaining: 907ms 297: learn: 0.3329870 total: 1.33s remaining: 903ms 298: learn: 0.3327762 total: 1.34s remaining: 898ms 299: learn: 0.3325142 total: 1.34s remaining: 894ms 300: learn: 0.3323433 total: 1.34s remaining: 889ms 301: learn: 0.3320582 total: 1.35s remaining: 884ms 302: learn: 0.3317948 total: 1.35s remaining: 880ms 303: learn: 0.3316476 total: 1.36s remaining: 875ms 304: learn: 0.3315116 total: 1.36s remaining: 871ms 305: learn: 0.3313714 total: 1.36s remaining: 866ms 306: learn: 0.3310924 total: 1.37s remaining: 861ms 307: learn: 0.3307139 total: 1.37s remaining: 857ms 308: learn: 0.3304509 total: 1.38s remaining: 852ms 309: learn: 0.3301529 total: 1.38s remaining: 848ms 310: learn: 0.3299043 total: 1.39s remaining: 844ms 311: learn: 0.3296830 total: 1.39s remaining: 839ms 312: learn: 0.3295532 total: 1.4s remaining: 835ms 313: learn: 0.3293181 total: 1.4s remaining: 831ms 314: learn: 0.3291434 total: 1.41s remaining: 826ms 315: learn: 0.3290008 total: 1.41s remaining: 822ms 316: learn: 0.3287641 total: 1.42s remaining: 817ms 317: learn: 0.3284565 total: 1.42s remaining: 813ms 318: learn: 0.3282028 total: 1.42s remaining: 808ms 319: learn: 0.3279633 total: 1.43s remaining: 803ms 320: learn: 0.3277893 total: 1.43s remaining: 799ms 321: learn: 0.3274594 total: 1.44s remaining: 794ms 322: learn: 0.3271429 total: 1.44s remaining: 790ms 323: learn: 0.3268233 total: 1.45s remaining: 785ms 324: learn: 0.3265536 total: 1.45s remaining: 781ms 325: learn: 0.3263620 total: 1.45s remaining: 776ms 326: learn: 0.3261476 total: 1.46s remaining: 771ms 327: learn: 0.3258907 total: 1.46s remaining: 767ms 328: learn: 0.3256347 total: 1.47s remaining: 762ms 329: learn: 0.3255209 total: 1.47s remaining: 758ms 330: learn: 0.3252162 total: 1.48s remaining: 753ms 331: learn: 0.3249211 total: 1.48s remaining: 749ms 332: learn: 0.3248113 total: 1.48s remaining: 744ms 333: learn: 0.3245877 total: 1.49s remaining: 739ms 334: learn: 0.3244265 total: 1.49s remaining: 735ms 335: learn: 0.3242222 total: 1.5s remaining: 730ms 336: learn: 0.3239710 total: 1.5s remaining: 726ms 337: learn: 0.3236796 total: 1.5s remaining: 721ms 338: learn: 0.3235540 total: 1.51s remaining: 717ms 339: learn: 0.3233247 total: 1.51s remaining: 712ms 340: learn: 0.3231514 total: 1.52s remaining: 707ms 341: learn: 0.3229921 total: 1.52s remaining: 703ms 342: learn: 0.3228261 total: 1.52s remaining: 698ms 343: learn: 0.3225548 total: 1.53s remaining: 694ms 344: learn: 0.3223300 total: 1.53s remaining: 689ms 345: learn: 0.3221700 total: 1.54s remaining: 685ms 346: learn: 0.3218305 total: 1.54s remaining: 680ms 347: learn: 0.3216517 total: 1.55s remaining: 676ms 348: learn: 0.3214773 total: 1.55s remaining: 671ms 349: learn: 0.3212061 total: 1.55s remaining: 667ms 350: learn: 0.3209463 total: 1.56s remaining: 662ms 351: learn: 0.3207009 total: 1.56s remaining: 658ms 352: learn: 0.3204353 total: 1.57s remaining: 653ms 353: learn: 0.3201435 total: 1.57s remaining: 649ms 354: learn: 0.3199788 total: 1.58s remaining: 644ms 355: learn: 0.3197831 total: 1.58s remaining: 640ms 356: learn: 0.3196475 total: 1.59s remaining: 635ms 357: learn: 0.3195041 total: 1.59s remaining: 631ms 358: learn: 0.3193346 total: 1.59s remaining: 626ms 359: learn: 0.3190688 total: 1.6s remaining: 622ms 360: learn: 0.3188957 total: 1.6s remaining: 618ms 361: learn: 0.3186951 total: 1.61s remaining: 613ms 362: learn: 0.3185346 total: 1.61s remaining: 609ms 363: learn: 0.3183291 total: 1.62s remaining: 604ms 364: learn: 0.3181640 total: 1.62s remaining: 600ms 365: learn: 0.3179669 total: 1.63s remaining: 596ms 366: learn: 0.3176881 total: 1.63s remaining: 591ms 367: learn: 0.3174850 total: 1.64s remaining: 587ms 368: learn: 0.3173196 total: 1.64s remaining: 583ms 369: learn: 0.3171836 total: 1.65s remaining: 578ms 370: learn: 0.3169669 total: 1.65s remaining: 574ms 371: learn: 0.3167441 total: 1.66s remaining: 570ms 372: learn: 0.3165412 total: 1.66s remaining: 565ms 373: learn: 0.3162564 total: 1.66s remaining: 561ms 374: learn: 0.3160403 total: 1.67s remaining: 556ms 375: learn: 0.3158625 total: 1.67s remaining: 552ms 376: learn: 0.3156029 total: 1.68s remaining: 547ms 377: learn: 0.3154028 total: 1.68s remaining: 543ms 378: learn: 0.3152610 total: 1.69s remaining: 539ms 379: learn: 0.3150768 total: 1.69s remaining: 534ms 380: learn: 0.3149100 total: 1.7s remaining: 530ms 381: learn: 0.3147168 total: 1.7s remaining: 526ms 382: learn: 0.3144597 total: 1.71s remaining: 521ms 383: learn: 0.3141522 total: 1.71s remaining: 517ms 384: learn: 0.3139195 total: 1.72s remaining: 513ms 385: learn: 0.3137682 total: 1.72s remaining: 508ms 386: learn: 0.3135814 total: 1.73s remaining: 504ms 387: learn: 0.3134311 total: 1.73s remaining: 499ms 388: learn: 0.3131777 total: 1.73s remaining: 495ms 389: learn: 0.3129681 total: 1.74s remaining: 490ms 390: learn: 0.3128155 total: 1.74s remaining: 486ms 391: learn: 0.3126621 total: 1.75s remaining: 482ms 392: learn: 0.3124620 total: 1.75s remaining: 477ms 393: learn: 0.3122817 total: 1.76s remaining: 473ms 394: learn: 0.3119730 total: 1.76s remaining: 468ms 395: learn: 0.3118019 total: 1.77s remaining: 464ms 396: learn: 0.3116760 total: 1.77s remaining: 459ms 397: learn: 0.3114665 total: 1.77s remaining: 455ms 398: learn: 0.3112598 total: 1.78s remaining: 450ms 399: learn: 0.3109805 total: 1.78s remaining: 446ms 400: learn: 0.3108199 total: 1.79s remaining: 442ms 401: learn: 0.3106389 total: 1.79s remaining: 437ms 402: learn: 0.3103400 total: 1.8s remaining: 433ms 403: learn: 0.3101822 total: 1.8s remaining: 428ms 404: learn: 0.3100133 total: 1.81s remaining: 424ms 405: learn: 0.3098858 total: 1.81s remaining: 420ms 406: learn: 0.3098053 total: 1.82s remaining: 415ms 407: learn: 0.3096342 total: 1.82s remaining: 411ms 408: learn: 0.3094406 total: 1.82s remaining: 406ms 409: learn: 0.3092680 total: 1.83s remaining: 402ms 410: learn: 0.3090495 total: 1.83s remaining: 397ms 411: learn: 0.3088651 total: 1.84s remaining: 393ms 412: learn: 0.3086578 total: 1.84s remaining: 388ms 413: learn: 0.3085374 total: 1.85s remaining: 384ms 414: learn: 0.3082606 total: 1.85s remaining: 379ms 415: learn: 0.3080259 total: 1.86s remaining: 375ms 416: learn: 0.3078063 total: 1.86s remaining: 371ms 417: learn: 0.3076169 total: 1.87s remaining: 366ms 418: learn: 0.3074439 total: 1.87s remaining: 362ms 419: learn: 0.3073432 total: 1.88s remaining: 357ms 420: learn: 0.3071002 total: 1.88s remaining: 353ms 421: learn: 0.3069547 total: 1.89s remaining: 349ms 422: learn: 0.3068052 total: 1.89s remaining: 344ms 423: learn: 0.3066443 total: 1.9s remaining: 340ms 424: learn: 0.3064255 total: 1.9s remaining: 335ms 425: learn: 0.3062297 total: 1.91s remaining: 331ms 426: learn: 0.3060150 total: 1.91s remaining: 327ms 427: learn: 0.3059343 total: 1.91s remaining: 322ms 428: learn: 0.3057685 total: 1.92s remaining: 318ms 429: learn: 0.3054583 total: 1.92s remaining: 313ms 430: learn: 0.3052342 total: 1.93s remaining: 309ms 431: learn: 0.3050329 total: 1.93s remaining: 304ms 432: learn: 0.3047790 total: 1.94s remaining: 300ms 433: learn: 0.3045401 total: 1.94s remaining: 295ms 434: learn: 0.3044138 total: 1.95s remaining: 291ms 435: learn: 0.3041903 total: 1.95s remaining: 286ms 436: learn: 0.3040483 total: 1.95s remaining: 282ms 437: learn: 0.3038387 total: 1.96s remaining: 277ms 438: learn: 0.3036834 total: 1.96s remaining: 273ms 439: learn: 0.3035708 total: 1.97s remaining: 268ms 440: learn: 0.3033995 total: 1.97s remaining: 264ms 441: learn: 0.3032521 total: 1.98s remaining: 259ms 442: learn: 0.3031190 total: 1.98s remaining: 255ms 443: learn: 0.3029623 total: 1.99s remaining: 250ms 444: learn: 0.3027578 total: 1.99s remaining: 246ms 445: learn: 0.3026579 total: 1.99s remaining: 241ms 446: learn: 0.3024539 total: 2s remaining: 237ms 447: learn: 0.3023502 total: 2s remaining: 233ms 448: learn: 0.3021773 total: 2.01s remaining: 228ms 449: learn: 0.3019574 total: 2.01s remaining: 224ms 450: learn: 0.3017977 total: 2.02s remaining: 219ms 451: learn: 0.3016176 total: 2.02s remaining: 215ms 452: learn: 0.3014791 total: 2.02s remaining: 210ms 453: learn: 0.3013126 total: 2.03s remaining: 206ms 454: learn: 0.3011083 total: 2.03s remaining: 201ms 455: learn: 0.3009095 total: 2.04s remaining: 197ms 456: learn: 0.3007403 total: 2.04s remaining: 192ms 457: learn: 0.3006040 total: 2.05s remaining: 188ms 458: learn: 0.3002811 total: 2.05s remaining: 183ms 459: learn: 0.2999927 total: 2.06s remaining: 179ms 460: learn: 0.2998643 total: 2.06s remaining: 174ms 461: learn: 0.2997269 total: 2.06s remaining: 170ms 462: learn: 0.2995341 total: 2.07s remaining: 165ms 463: learn: 0.2993643 total: 2.07s remaining: 161ms 464: learn: 0.2992783 total: 2.08s remaining: 156ms 465: learn: 0.2991613 total: 2.08s remaining: 152ms 466: learn: 0.2989938 total: 2.09s remaining: 147ms 467: learn: 0.2987307 total: 2.09s remaining: 143ms 468: learn: 0.2986396 total: 2.1s remaining: 139ms 469: learn: 0.2984617 total: 2.1s remaining: 134ms 470: learn: 0.2983607 total: 2.1s remaining: 130ms 471: learn: 0.2982174 total: 2.11s remaining: 125ms 472: learn: 0.2980570 total: 2.11s remaining: 121ms 473: learn: 0.2979115 total: 2.12s remaining: 116ms 474: learn: 0.2977440 total: 2.12s remaining: 112ms 475: learn: 0.2975670 total: 2.13s remaining: 107ms 476: learn: 0.2974509 total: 2.13s remaining: 103ms 477: learn: 0.2973277 total: 2.13s remaining: 98.3ms 478: learn: 0.2972237 total: 2.14s remaining: 93.8ms 479: learn: 0.2970308 total: 2.14s remaining: 89.4ms 480: learn: 0.2968913 total: 2.15s remaining: 84.9ms 481: learn: 0.2967294 total: 2.15s remaining: 80.4ms 482: learn: 0.2965558 total: 2.16s remaining: 75.9ms 483: learn: 0.2964349 total: 2.16s remaining: 71.5ms 484: learn: 0.2962704 total: 2.17s remaining: 67ms 485: learn: 0.2960857 total: 2.17s remaining: 62.5ms 486: learn: 0.2959286 total: 2.17s remaining: 58.1ms 487: learn: 0.2957958 total: 2.18s remaining: 53.6ms 488: learn: 0.2956658 total: 2.18s remaining: 49.1ms 489: learn: 0.2954544 total: 2.19s remaining: 44.7ms 490: learn: 0.2953442 total: 2.19s remaining: 40.2ms 491: learn: 0.2952219 total: 2.2s remaining: 35.7ms 492: learn: 0.2950469 total: 2.2s remaining: 31.3ms 493: learn: 0.2949626 total: 2.21s remaining: 26.8ms 494: learn: 0.2948345 total: 2.21s remaining: 22.3ms 495: learn: 0.2946874 total: 2.21s remaining: 17.9ms 496: learn: 0.2944452 total: 2.22s remaining: 13.4ms 497: learn: 0.2942655 total: 2.22s remaining: 8.93ms 498: learn: 0.2941078 total: 2.23s remaining: 4.46ms 499: learn: 0.2939681 total: 2.23s remaining: 0us Acc Train = 87.5 Acc Test = 82.17
y_test_pred_prob = pd.DataFrame(cbc_mod.predict_proba(X_test)[:, 1])
fpr, tpr, best_threshold_ind, best_threshold = get_roc_curve(y_test, y_test_pred_prob)
roc_curves['CatBoostClassifier'] = (fpr, tpr, best_threshold_ind)
plot_roc_curve(fpr, tpr, best_threshold_ind)
tpr = 0.8397212543554007 fpr = 0.1947608200455581 threshold = 0.4969305625069996
y_test_pred = y_test_pred_prob.applymap(lambda x: 1 if x > best_threshold else 0)
y_test_pred
print('ACC with threshold', round(best_threshold, 3), '=', round(accuracy_score(y_test, y_test_pred) *100, 5))
ACC with threshold 0.497 = 82.17366
cbc_mod_all = CatBoostClassifier(n_estimators=500, learning_rate=0.05, max_depth=5) #var1 BEST!!!!!
# cbc_mod_all = CatBoostClassifier(n_estimators=400, learning_rate=0.05, max_depth=5) #var1
# cbc_mod_all = CatBoostClassifier(n_estimators=100, learning_rate=0.01, max_depth=8) #var2 schlechter als var1
# {'learning_rate': 0.01, 'max_depth': 8, 'n_estimators': 100}
cbc_mod_all.fit(X, y)
0: learn: 0.6717601 total: 18.1ms remaining: 9.04s 1: learn: 0.6559285 total: 32.4ms remaining: 8.07s 2: learn: 0.6367266 total: 40.2ms remaining: 6.67s 3: learn: 0.6199487 total: 48ms remaining: 5.95s 4: learn: 0.6064402 total: 55.3ms remaining: 5.47s 5: learn: 0.5930960 total: 61.9ms remaining: 5.1s 6: learn: 0.5824253 total: 66.4ms remaining: 4.68s 7: learn: 0.5713423 total: 70.9ms remaining: 4.36s 8: learn: 0.5628770 total: 75.2ms remaining: 4.1s 9: learn: 0.5510038 total: 79.8ms remaining: 3.91s 10: learn: 0.5445644 total: 83.8ms remaining: 3.73s 11: learn: 0.5371470 total: 88.3ms remaining: 3.59s 12: learn: 0.5298268 total: 92.9ms remaining: 3.48s 13: learn: 0.5214459 total: 97.5ms remaining: 3.38s 14: learn: 0.5153465 total: 102ms remaining: 3.3s 15: learn: 0.5113514 total: 106ms remaining: 3.22s 16: learn: 0.5073899 total: 111ms remaining: 3.15s 17: learn: 0.5030747 total: 116ms remaining: 3.09s 18: learn: 0.4994706 total: 120ms remaining: 3.04s 19: learn: 0.4945849 total: 125ms remaining: 2.99s 20: learn: 0.4913744 total: 129ms remaining: 2.95s 21: learn: 0.4887091 total: 134ms remaining: 2.9s 22: learn: 0.4857981 total: 138ms remaining: 2.86s 23: learn: 0.4825017 total: 142ms remaining: 2.82s 24: learn: 0.4786623 total: 147ms remaining: 2.79s 25: learn: 0.4768492 total: 151ms remaining: 2.76s 26: learn: 0.4734889 total: 156ms remaining: 2.73s 27: learn: 0.4710118 total: 160ms remaining: 2.7s 28: learn: 0.4684620 total: 165ms remaining: 2.68s 29: learn: 0.4662380 total: 169ms remaining: 2.65s 30: learn: 0.4646370 total: 174ms remaining: 2.63s 31: learn: 0.4631152 total: 178ms remaining: 2.61s 32: learn: 0.4609342 total: 183ms remaining: 2.59s 33: learn: 0.4592114 total: 187ms remaining: 2.57s 34: learn: 0.4569889 total: 192ms remaining: 2.56s 35: learn: 0.4545138 total: 197ms remaining: 2.54s 36: learn: 0.4531351 total: 202ms remaining: 2.53s 37: learn: 0.4511333 total: 207ms remaining: 2.51s 38: learn: 0.4500232 total: 212ms remaining: 2.51s 39: learn: 0.4485887 total: 217ms remaining: 2.5s 40: learn: 0.4469055 total: 222ms remaining: 2.48s 41: learn: 0.4453079 total: 226ms remaining: 2.47s 42: learn: 0.4442801 total: 231ms remaining: 2.46s 43: learn: 0.4430573 total: 236ms remaining: 2.44s 44: learn: 0.4417083 total: 240ms remaining: 2.43s 45: learn: 0.4399265 total: 246ms remaining: 2.42s 46: learn: 0.4383401 total: 251ms remaining: 2.41s 47: learn: 0.4373329 total: 256ms remaining: 2.41s 48: learn: 0.4364893 total: 260ms remaining: 2.4s 49: learn: 0.4356206 total: 265ms remaining: 2.38s 50: learn: 0.4342541 total: 270ms remaining: 2.37s 51: learn: 0.4335778 total: 274ms remaining: 2.36s 52: learn: 0.4320319 total: 279ms remaining: 2.35s 53: learn: 0.4309256 total: 283ms remaining: 2.34s 54: learn: 0.4299535 total: 287ms remaining: 2.33s 55: learn: 0.4285534 total: 292ms remaining: 2.31s 56: learn: 0.4275038 total: 296ms remaining: 2.3s 57: learn: 0.4263364 total: 301ms remaining: 2.29s 58: learn: 0.4256917 total: 305ms remaining: 2.28s 59: learn: 0.4252728 total: 310ms remaining: 2.27s 60: learn: 0.4249211 total: 314ms remaining: 2.26s 61: learn: 0.4242374 total: 319ms remaining: 2.25s 62: learn: 0.4231913 total: 323ms remaining: 2.24s 63: learn: 0.4224360 total: 328ms remaining: 2.23s 64: learn: 0.4217434 total: 332ms remaining: 2.22s 65: learn: 0.4211012 total: 337ms remaining: 2.21s 66: learn: 0.4201431 total: 341ms remaining: 2.2s 67: learn: 0.4187252 total: 346ms remaining: 2.19s 68: learn: 0.4175828 total: 350ms remaining: 2.19s 69: learn: 0.4167893 total: 356ms remaining: 2.19s 70: learn: 0.4160514 total: 361ms remaining: 2.18s 71: learn: 0.4152627 total: 366ms remaining: 2.18s 72: learn: 0.4145157 total: 371ms remaining: 2.17s 73: learn: 0.4136453 total: 376ms remaining: 2.16s 74: learn: 0.4125564 total: 381ms remaining: 2.16s 75: learn: 0.4116931 total: 386ms remaining: 2.15s 76: learn: 0.4110673 total: 390ms remaining: 2.14s 77: learn: 0.4106763 total: 395ms remaining: 2.13s 78: learn: 0.4099986 total: 399ms remaining: 2.13s 79: learn: 0.4095265 total: 404ms remaining: 2.12s 80: learn: 0.4089138 total: 409ms remaining: 2.11s 81: learn: 0.4084111 total: 413ms remaining: 2.11s 82: learn: 0.4074390 total: 418ms remaining: 2.1s 83: learn: 0.4069154 total: 422ms remaining: 2.09s 84: learn: 0.4064620 total: 427ms remaining: 2.08s 85: learn: 0.4059731 total: 432ms remaining: 2.08s 86: learn: 0.4057280 total: 437ms remaining: 2.07s 87: learn: 0.4052259 total: 441ms remaining: 2.07s 88: learn: 0.4046340 total: 446ms remaining: 2.06s 89: learn: 0.4039782 total: 450ms remaining: 2.05s 90: learn: 0.4032069 total: 455ms remaining: 2.04s 91: learn: 0.4026369 total: 460ms remaining: 2.04s 92: learn: 0.4019732 total: 464ms remaining: 2.03s 93: learn: 0.4015387 total: 469ms remaining: 2.03s 94: learn: 0.4009854 total: 474ms remaining: 2.02s 95: learn: 0.4006115 total: 479ms remaining: 2.02s 96: learn: 0.3994422 total: 484ms remaining: 2.01s 97: learn: 0.3987352 total: 489ms remaining: 2s 98: learn: 0.3982851 total: 494ms remaining: 2s 99: learn: 0.3979062 total: 499ms remaining: 2s 100: learn: 0.3974923 total: 504ms remaining: 1.99s 101: learn: 0.3972004 total: 509ms remaining: 1.98s 102: learn: 0.3968142 total: 513ms remaining: 1.98s 103: learn: 0.3965742 total: 518ms remaining: 1.97s 104: learn: 0.3963670 total: 523ms remaining: 1.97s 105: learn: 0.3960710 total: 528ms remaining: 1.96s 106: learn: 0.3955506 total: 533ms remaining: 1.96s 107: learn: 0.3949300 total: 538ms remaining: 1.95s 108: learn: 0.3943364 total: 542ms remaining: 1.95s 109: learn: 0.3937671 total: 547ms remaining: 1.94s 110: learn: 0.3935486 total: 552ms remaining: 1.93s 111: learn: 0.3931848 total: 557ms remaining: 1.93s 112: learn: 0.3927611 total: 561ms remaining: 1.92s 113: learn: 0.3924328 total: 566ms remaining: 1.92s 114: learn: 0.3920901 total: 571ms remaining: 1.91s 115: learn: 0.3915730 total: 575ms remaining: 1.9s 116: learn: 0.3912976 total: 580ms remaining: 1.9s 117: learn: 0.3909855 total: 585ms remaining: 1.89s 118: learn: 0.3906130 total: 590ms remaining: 1.89s 119: learn: 0.3903298 total: 594ms remaining: 1.88s 120: learn: 0.3900764 total: 599ms remaining: 1.88s 121: learn: 0.3898187 total: 603ms remaining: 1.87s 122: learn: 0.3894357 total: 608ms remaining: 1.86s 123: learn: 0.3889836 total: 612ms remaining: 1.86s 124: learn: 0.3885785 total: 617ms remaining: 1.85s 125: learn: 0.3882237 total: 621ms remaining: 1.84s 126: learn: 0.3880068 total: 626ms remaining: 1.84s 127: learn: 0.3877897 total: 630ms remaining: 1.83s 128: learn: 0.3875918 total: 634ms remaining: 1.82s 129: learn: 0.3869848 total: 639ms remaining: 1.82s 130: learn: 0.3864431 total: 643ms remaining: 1.81s 131: learn: 0.3862427 total: 648ms remaining: 1.81s 132: learn: 0.3860463 total: 652ms remaining: 1.8s 133: learn: 0.3857634 total: 657ms remaining: 1.79s 134: learn: 0.3855350 total: 661ms remaining: 1.79s 135: learn: 0.3852828 total: 666ms remaining: 1.78s 136: learn: 0.3850812 total: 671ms remaining: 1.78s 137: learn: 0.3848158 total: 675ms remaining: 1.77s 138: learn: 0.3845824 total: 680ms remaining: 1.76s 139: learn: 0.3844295 total: 684ms remaining: 1.76s 140: learn: 0.3838684 total: 689ms remaining: 1.75s 141: learn: 0.3835324 total: 694ms remaining: 1.75s 142: learn: 0.3832876 total: 699ms remaining: 1.74s 143: learn: 0.3830584 total: 703ms remaining: 1.74s 144: learn: 0.3827970 total: 708ms remaining: 1.73s 145: learn: 0.3826102 total: 712ms remaining: 1.73s 146: learn: 0.3824430 total: 717ms remaining: 1.72s 147: learn: 0.3822307 total: 722ms remaining: 1.72s 148: learn: 0.3818330 total: 726ms remaining: 1.71s 149: learn: 0.3816120 total: 731ms remaining: 1.71s 150: learn: 0.3814018 total: 736ms remaining: 1.7s 151: learn: 0.3809961 total: 740ms remaining: 1.7s 152: learn: 0.3802546 total: 745ms remaining: 1.69s 153: learn: 0.3797225 total: 750ms remaining: 1.69s 154: learn: 0.3794987 total: 755ms remaining: 1.68s 155: learn: 0.3791479 total: 759ms remaining: 1.67s 156: learn: 0.3789332 total: 764ms remaining: 1.67s 157: learn: 0.3787225 total: 769ms remaining: 1.66s 158: learn: 0.3785415 total: 774ms remaining: 1.66s 159: learn: 0.3782864 total: 778ms remaining: 1.65s 160: learn: 0.3780342 total: 783ms remaining: 1.65s 161: learn: 0.3778526 total: 787ms remaining: 1.64s 162: learn: 0.3776400 total: 792ms remaining: 1.64s 163: learn: 0.3773102 total: 796ms remaining: 1.63s 164: learn: 0.3769944 total: 801ms remaining: 1.63s 165: learn: 0.3765577 total: 805ms remaining: 1.62s 166: learn: 0.3763370 total: 810ms remaining: 1.61s 167: learn: 0.3759730 total: 815ms remaining: 1.61s 168: learn: 0.3756746 total: 819ms remaining: 1.6s 169: learn: 0.3754188 total: 824ms remaining: 1.6s 170: learn: 0.3752272 total: 828ms remaining: 1.59s 171: learn: 0.3750105 total: 832ms remaining: 1.59s 172: learn: 0.3747266 total: 837ms remaining: 1.58s 173: learn: 0.3743836 total: 841ms remaining: 1.58s 174: learn: 0.3741939 total: 846ms remaining: 1.57s 175: learn: 0.3739720 total: 850ms remaining: 1.56s 176: learn: 0.3737801 total: 855ms remaining: 1.56s 177: learn: 0.3734711 total: 859ms remaining: 1.55s 178: learn: 0.3732889 total: 864ms remaining: 1.55s 179: learn: 0.3726123 total: 868ms remaining: 1.54s 180: learn: 0.3723271 total: 873ms remaining: 1.54s 181: learn: 0.3721449 total: 877ms remaining: 1.53s 182: learn: 0.3718879 total: 882ms remaining: 1.53s 183: learn: 0.3716904 total: 886ms remaining: 1.52s 184: learn: 0.3714175 total: 891ms remaining: 1.52s 185: learn: 0.3711286 total: 895ms remaining: 1.51s 186: learn: 0.3706215 total: 900ms remaining: 1.5s 187: learn: 0.3703787 total: 905ms remaining: 1.5s 188: learn: 0.3700724 total: 910ms remaining: 1.5s 189: learn: 0.3691602 total: 915ms remaining: 1.49s 190: learn: 0.3689259 total: 919ms remaining: 1.49s 191: learn: 0.3684667 total: 924ms remaining: 1.48s 192: learn: 0.3682771 total: 929ms remaining: 1.48s 193: learn: 0.3681055 total: 933ms remaining: 1.47s 194: learn: 0.3678036 total: 938ms remaining: 1.47s 195: learn: 0.3676447 total: 943ms remaining: 1.46s 196: learn: 0.3674059 total: 948ms remaining: 1.46s 197: learn: 0.3671371 total: 953ms remaining: 1.45s 198: learn: 0.3668385 total: 957ms remaining: 1.45s 199: learn: 0.3666176 total: 962ms remaining: 1.44s 200: learn: 0.3664413 total: 966ms remaining: 1.44s 201: learn: 0.3661534 total: 971ms remaining: 1.43s 202: learn: 0.3658395 total: 976ms remaining: 1.43s 203: learn: 0.3653964 total: 980ms remaining: 1.42s 204: learn: 0.3652394 total: 985ms remaining: 1.42s 205: learn: 0.3650271 total: 989ms remaining: 1.41s 206: learn: 0.3648086 total: 994ms remaining: 1.41s 207: learn: 0.3645508 total: 998ms remaining: 1.4s 208: learn: 0.3642864 total: 1s remaining: 1.4s 209: learn: 0.3639472 total: 1.01s remaining: 1.39s 210: learn: 0.3635741 total: 1.01s remaining: 1.39s 211: learn: 0.3631929 total: 1.01s remaining: 1.38s 212: learn: 0.3627399 total: 1.02s remaining: 1.37s 213: learn: 0.3624739 total: 1.02s remaining: 1.37s 214: learn: 0.3622297 total: 1.03s remaining: 1.36s 215: learn: 0.3620023 total: 1.03s remaining: 1.36s 216: learn: 0.3618196 total: 1.04s remaining: 1.35s 217: learn: 0.3614155 total: 1.04s remaining: 1.35s 218: learn: 0.3609358 total: 1.05s remaining: 1.34s 219: learn: 0.3607155 total: 1.05s remaining: 1.34s 220: learn: 0.3604386 total: 1.06s remaining: 1.33s 221: learn: 0.3601874 total: 1.06s remaining: 1.33s 222: learn: 0.3598694 total: 1.06s remaining: 1.32s 223: learn: 0.3596441 total: 1.07s remaining: 1.32s 224: learn: 0.3594202 total: 1.07s remaining: 1.31s 225: learn: 0.3590983 total: 1.08s remaining: 1.31s 226: learn: 0.3588792 total: 1.08s remaining: 1.3s 227: learn: 0.3586156 total: 1.09s remaining: 1.3s 228: learn: 0.3583252 total: 1.09s remaining: 1.29s 229: learn: 0.3580778 total: 1.1s remaining: 1.29s 230: learn: 0.3577989 total: 1.1s remaining: 1.28s 231: learn: 0.3575176 total: 1.1s remaining: 1.28s 232: learn: 0.3572006 total: 1.11s remaining: 1.27s 233: learn: 0.3568898 total: 1.11s remaining: 1.27s 234: learn: 0.3566845 total: 1.12s remaining: 1.26s 235: learn: 0.3563817 total: 1.12s remaining: 1.26s 236: learn: 0.3560629 total: 1.13s remaining: 1.25s 237: learn: 0.3558409 total: 1.13s remaining: 1.25s 238: learn: 0.3555551 total: 1.14s remaining: 1.24s 239: learn: 0.3552568 total: 1.14s remaining: 1.24s 240: learn: 0.3548932 total: 1.15s remaining: 1.23s 241: learn: 0.3546428 total: 1.15s remaining: 1.23s 242: learn: 0.3544089 total: 1.16s remaining: 1.22s 243: learn: 0.3540873 total: 1.16s remaining: 1.22s 244: learn: 0.3538193 total: 1.17s remaining: 1.21s 245: learn: 0.3534859 total: 1.17s remaining: 1.21s 246: learn: 0.3530990 total: 1.18s remaining: 1.2s 247: learn: 0.3528541 total: 1.18s remaining: 1.2s 248: learn: 0.3524647 total: 1.18s remaining: 1.19s 249: learn: 0.3522479 total: 1.19s remaining: 1.19s 250: learn: 0.3520987 total: 1.19s remaining: 1.18s 251: learn: 0.3518770 total: 1.2s remaining: 1.18s 252: learn: 0.3516042 total: 1.2s remaining: 1.17s 253: learn: 0.3514098 total: 1.21s remaining: 1.17s 254: learn: 0.3511992 total: 1.21s remaining: 1.16s 255: learn: 0.3509905 total: 1.22s remaining: 1.16s 256: learn: 0.3507944 total: 1.22s remaining: 1.15s 257: learn: 0.3505689 total: 1.23s remaining: 1.15s 258: learn: 0.3503269 total: 1.23s remaining: 1.14s 259: learn: 0.3501707 total: 1.23s remaining: 1.14s 260: learn: 0.3499277 total: 1.24s remaining: 1.13s 261: learn: 0.3493997 total: 1.24s remaining: 1.13s 262: learn: 0.3491493 total: 1.25s remaining: 1.12s 263: learn: 0.3489810 total: 1.25s remaining: 1.12s 264: learn: 0.3488104 total: 1.26s remaining: 1.11s 265: learn: 0.3485737 total: 1.26s remaining: 1.11s 266: learn: 0.3483035 total: 1.26s remaining: 1.1s 267: learn: 0.3481858 total: 1.27s remaining: 1.1s 268: learn: 0.3479382 total: 1.27s remaining: 1.09s 269: learn: 0.3477118 total: 1.28s remaining: 1.09s 270: learn: 0.3474673 total: 1.28s remaining: 1.08s 271: learn: 0.3472235 total: 1.29s remaining: 1.08s 272: learn: 0.3470084 total: 1.29s remaining: 1.07s 273: learn: 0.3468528 total: 1.3s remaining: 1.07s 274: learn: 0.3466809 total: 1.3s remaining: 1.06s 275: learn: 0.3464293 total: 1.31s remaining: 1.06s 276: learn: 0.3461370 total: 1.31s remaining: 1.06s 277: learn: 0.3459567 total: 1.32s remaining: 1.05s 278: learn: 0.3457771 total: 1.32s remaining: 1.05s 279: learn: 0.3455924 total: 1.33s remaining: 1.04s 280: learn: 0.3453867 total: 1.33s remaining: 1.04s 281: learn: 0.3450915 total: 1.33s remaining: 1.03s 282: learn: 0.3448678 total: 1.34s remaining: 1.03s 283: learn: 0.3445951 total: 1.34s remaining: 1.02s 284: learn: 0.3444689 total: 1.35s remaining: 1.02s 285: learn: 0.3441973 total: 1.35s remaining: 1.01s 286: learn: 0.3439830 total: 1.36s remaining: 1.01s 287: learn: 0.3438204 total: 1.36s remaining: 1s 288: learn: 0.3436561 total: 1.37s remaining: 998ms 289: learn: 0.3434784 total: 1.37s remaining: 993ms 290: learn: 0.3432609 total: 1.38s remaining: 988ms 291: learn: 0.3430424 total: 1.38s remaining: 983ms 292: learn: 0.3428841 total: 1.38s remaining: 978ms 293: learn: 0.3425600 total: 1.39s remaining: 974ms 294: learn: 0.3424239 total: 1.39s remaining: 969ms 295: learn: 0.3422246 total: 1.4s remaining: 964ms 296: learn: 0.3420495 total: 1.4s remaining: 959ms 297: learn: 0.3417844 total: 1.41s remaining: 954ms 298: learn: 0.3416426 total: 1.41s remaining: 949ms 299: learn: 0.3413840 total: 1.42s remaining: 944ms 300: learn: 0.3411010 total: 1.42s remaining: 939ms 301: learn: 0.3407767 total: 1.43s remaining: 934ms 302: learn: 0.3405628 total: 1.43s remaining: 930ms 303: learn: 0.3403627 total: 1.43s remaining: 925ms 304: learn: 0.3401680 total: 1.44s remaining: 920ms 305: learn: 0.3400366 total: 1.44s remaining: 915ms 306: learn: 0.3398456 total: 1.45s remaining: 910ms 307: learn: 0.3396233 total: 1.45s remaining: 905ms 308: learn: 0.3393664 total: 1.46s remaining: 901ms 309: learn: 0.3390886 total: 1.46s remaining: 896ms 310: learn: 0.3389059 total: 1.47s remaining: 892ms 311: learn: 0.3386825 total: 1.47s remaining: 887ms 312: learn: 0.3384965 total: 1.48s remaining: 882ms 313: learn: 0.3382500 total: 1.48s remaining: 877ms 314: learn: 0.3381051 total: 1.49s remaining: 873ms 315: learn: 0.3378937 total: 1.49s remaining: 868ms 316: learn: 0.3376916 total: 1.5s remaining: 863ms 317: learn: 0.3375115 total: 1.5s remaining: 859ms 318: learn: 0.3373210 total: 1.5s remaining: 854ms 319: learn: 0.3371033 total: 1.51s remaining: 849ms 320: learn: 0.3369182 total: 1.51s remaining: 844ms 321: learn: 0.3367748 total: 1.52s remaining: 839ms 322: learn: 0.3366565 total: 1.52s remaining: 835ms 323: learn: 0.3365268 total: 1.53s remaining: 830ms 324: learn: 0.3364213 total: 1.53s remaining: 825ms 325: learn: 0.3362802 total: 1.54s remaining: 820ms 326: learn: 0.3359843 total: 1.54s remaining: 815ms 327: learn: 0.3357722 total: 1.54s remaining: 810ms 328: learn: 0.3355504 total: 1.55s remaining: 805ms 329: learn: 0.3353803 total: 1.55s remaining: 801ms 330: learn: 0.3352275 total: 1.56s remaining: 796ms 331: learn: 0.3350715 total: 1.56s remaining: 791ms 332: learn: 0.3348467 total: 1.57s remaining: 786ms 333: learn: 0.3346739 total: 1.57s remaining: 781ms 334: learn: 0.3344955 total: 1.58s remaining: 776ms 335: learn: 0.3343182 total: 1.58s remaining: 771ms 336: learn: 0.3341957 total: 1.58s remaining: 767ms 337: learn: 0.3339743 total: 1.59s remaining: 762ms 338: learn: 0.3337593 total: 1.59s remaining: 757ms 339: learn: 0.3336233 total: 1.6s remaining: 752ms 340: learn: 0.3334741 total: 1.6s remaining: 747ms 341: learn: 0.3332315 total: 1.61s remaining: 742ms 342: learn: 0.3330434 total: 1.61s remaining: 738ms 343: learn: 0.3328270 total: 1.62s remaining: 733ms 344: learn: 0.3327191 total: 1.62s remaining: 728ms 345: learn: 0.3324520 total: 1.62s remaining: 723ms 346: learn: 0.3322861 total: 1.63s remaining: 718ms 347: learn: 0.3321402 total: 1.63s remaining: 714ms 348: learn: 0.3318896 total: 1.64s remaining: 709ms 349: learn: 0.3317607 total: 1.64s remaining: 704ms 350: learn: 0.3315616 total: 1.65s remaining: 699ms 351: learn: 0.3314236 total: 1.65s remaining: 695ms 352: learn: 0.3312920 total: 1.66s remaining: 690ms 353: learn: 0.3310145 total: 1.66s remaining: 685ms 354: learn: 0.3307983 total: 1.67s remaining: 681ms 355: learn: 0.3306109 total: 1.67s remaining: 676ms 356: learn: 0.3303871 total: 1.68s remaining: 671ms 357: learn: 0.3302206 total: 1.68s remaining: 667ms 358: learn: 0.3300003 total: 1.69s remaining: 662ms 359: learn: 0.3298057 total: 1.69s remaining: 657ms 360: learn: 0.3296643 total: 1.7s remaining: 653ms 361: learn: 0.3294424 total: 1.7s remaining: 648ms 362: learn: 0.3292544 total: 1.7s remaining: 643ms 363: learn: 0.3290123 total: 1.71s remaining: 639ms 364: learn: 0.3287291 total: 1.71s remaining: 634ms 365: learn: 0.3285776 total: 1.72s remaining: 629ms 366: learn: 0.3284752 total: 1.72s remaining: 624ms 367: learn: 0.3282392 total: 1.73s remaining: 620ms 368: learn: 0.3280258 total: 1.73s remaining: 615ms 369: learn: 0.3278825 total: 1.74s remaining: 610ms 370: learn: 0.3276901 total: 1.74s remaining: 605ms 371: learn: 0.3275092 total: 1.75s remaining: 600ms 372: learn: 0.3273203 total: 1.75s remaining: 596ms 373: learn: 0.3271396 total: 1.75s remaining: 591ms 374: learn: 0.3269237 total: 1.76s remaining: 586ms 375: learn: 0.3267089 total: 1.76s remaining: 581ms 376: learn: 0.3265159 total: 1.77s remaining: 577ms 377: learn: 0.3263222 total: 1.77s remaining: 572ms 378: learn: 0.3261838 total: 1.78s remaining: 567ms 379: learn: 0.3260458 total: 1.78s remaining: 562ms 380: learn: 0.3258971 total: 1.78s remaining: 558ms 381: learn: 0.3256985 total: 1.79s remaining: 553ms 382: learn: 0.3255459 total: 1.79s remaining: 548ms 383: learn: 0.3253767 total: 1.8s remaining: 543ms 384: learn: 0.3252537 total: 1.8s remaining: 539ms 385: learn: 0.3250408 total: 1.81s remaining: 534ms 386: learn: 0.3248237 total: 1.81s remaining: 529ms 387: learn: 0.3246429 total: 1.82s remaining: 524ms 388: learn: 0.3245438 total: 1.82s remaining: 520ms 389: learn: 0.3244019 total: 1.82s remaining: 515ms 390: learn: 0.3242587 total: 1.83s remaining: 510ms 391: learn: 0.3241970 total: 1.83s remaining: 505ms 392: learn: 0.3240150 total: 1.84s remaining: 501ms 393: learn: 0.3238193 total: 1.84s remaining: 496ms 394: learn: 0.3236717 total: 1.85s remaining: 491ms 395: learn: 0.3234936 total: 1.85s remaining: 487ms 396: learn: 0.3232674 total: 1.86s remaining: 482ms 397: learn: 0.3230962 total: 1.86s remaining: 477ms 398: learn: 0.3229725 total: 1.87s remaining: 473ms 399: learn: 0.3228008 total: 1.87s remaining: 468ms 400: learn: 0.3226170 total: 1.88s remaining: 463ms 401: learn: 0.3224791 total: 1.88s remaining: 459ms 402: learn: 0.3222931 total: 1.89s remaining: 454ms 403: learn: 0.3220041 total: 1.89s remaining: 449ms 404: learn: 0.3218080 total: 1.9s remaining: 445ms 405: learn: 0.3215786 total: 1.9s remaining: 440ms 406: learn: 0.3213293 total: 1.9s remaining: 435ms 407: learn: 0.3211666 total: 1.91s remaining: 430ms 408: learn: 0.3209909 total: 1.91s remaining: 426ms 409: learn: 0.3209621 total: 1.92s remaining: 421ms 410: learn: 0.3208176 total: 1.92s remaining: 416ms 411: learn: 0.3207185 total: 1.93s remaining: 412ms 412: learn: 0.3205567 total: 1.93s remaining: 407ms 413: learn: 0.3204339 total: 1.94s remaining: 402ms 414: learn: 0.3202224 total: 1.94s remaining: 397ms 415: learn: 0.3201699 total: 1.94s remaining: 393ms 416: learn: 0.3200657 total: 1.95s remaining: 388ms 417: learn: 0.3199241 total: 1.95s remaining: 383ms 418: learn: 0.3197214 total: 1.96s remaining: 378ms 419: learn: 0.3195637 total: 1.96s remaining: 374ms 420: learn: 0.3194919 total: 1.97s remaining: 369ms 421: learn: 0.3193358 total: 1.97s remaining: 364ms 422: learn: 0.3191969 total: 1.98s remaining: 360ms 423: learn: 0.3190577 total: 1.98s remaining: 355ms 424: learn: 0.3189382 total: 1.98s remaining: 350ms 425: learn: 0.3187837 total: 1.99s remaining: 345ms 426: learn: 0.3186229 total: 1.99s remaining: 341ms 427: learn: 0.3184402 total: 2s remaining: 336ms 428: learn: 0.3181934 total: 2s remaining: 331ms 429: learn: 0.3179372 total: 2.01s remaining: 327ms 430: learn: 0.3177930 total: 2.01s remaining: 322ms 431: learn: 0.3176247 total: 2.02s remaining: 317ms 432: learn: 0.3174426 total: 2.02s remaining: 313ms 433: learn: 0.3172850 total: 2.02s remaining: 308ms 434: learn: 0.3171810 total: 2.03s remaining: 303ms 435: learn: 0.3169647 total: 2.03s remaining: 299ms 436: learn: 0.3167284 total: 2.04s remaining: 294ms 437: learn: 0.3166068 total: 2.04s remaining: 289ms 438: learn: 0.3164993 total: 2.05s remaining: 285ms 439: learn: 0.3163399 total: 2.05s remaining: 280ms 440: learn: 0.3162061 total: 2.06s remaining: 275ms 441: learn: 0.3160192 total: 2.06s remaining: 271ms 442: learn: 0.3158436 total: 2.07s remaining: 266ms 443: learn: 0.3156906 total: 2.07s remaining: 261ms 444: learn: 0.3155182 total: 2.08s remaining: 257ms 445: learn: 0.3154327 total: 2.08s remaining: 252ms 446: learn: 0.3152776 total: 2.08s remaining: 247ms 447: learn: 0.3151075 total: 2.09s remaining: 243ms 448: learn: 0.3150200 total: 2.09s remaining: 238ms 449: learn: 0.3148317 total: 2.1s remaining: 233ms 450: learn: 0.3146968 total: 2.1s remaining: 229ms 451: learn: 0.3145142 total: 2.11s remaining: 224ms 452: learn: 0.3142551 total: 2.11s remaining: 219ms 453: learn: 0.3141559 total: 2.12s remaining: 214ms 454: learn: 0.3140553 total: 2.12s remaining: 210ms 455: learn: 0.3139161 total: 2.13s remaining: 205ms 456: learn: 0.3137254 total: 2.13s remaining: 200ms 457: learn: 0.3135849 total: 2.13s remaining: 196ms 458: learn: 0.3134253 total: 2.14s remaining: 191ms 459: learn: 0.3132513 total: 2.14s remaining: 186ms 460: learn: 0.3131640 total: 2.15s remaining: 182ms 461: learn: 0.3130146 total: 2.15s remaining: 177ms 462: learn: 0.3129096 total: 2.16s remaining: 172ms 463: learn: 0.3128294 total: 2.16s remaining: 168ms 464: learn: 0.3126863 total: 2.17s remaining: 163ms 465: learn: 0.3124912 total: 2.17s remaining: 158ms 466: learn: 0.3123493 total: 2.17s remaining: 154ms 467: learn: 0.3121929 total: 2.18s remaining: 149ms 468: learn: 0.3119701 total: 2.18s remaining: 144ms 469: learn: 0.3118240 total: 2.19s remaining: 140ms 470: learn: 0.3116671 total: 2.19s remaining: 135ms 471: learn: 0.3116504 total: 2.2s remaining: 130ms 472: learn: 0.3114849 total: 2.2s remaining: 126ms 473: learn: 0.3113597 total: 2.21s remaining: 121ms 474: learn: 0.3111754 total: 2.21s remaining: 116ms 475: learn: 0.3110146 total: 2.21s remaining: 112ms 476: learn: 0.3108544 total: 2.22s remaining: 107ms 477: learn: 0.3107609 total: 2.23s remaining: 102ms 478: learn: 0.3105628 total: 2.23s remaining: 97.8ms 479: learn: 0.3104318 total: 2.23s remaining: 93.1ms 480: learn: 0.3102733 total: 2.24s remaining: 88.5ms 481: learn: 0.3100673 total: 2.24s remaining: 83.8ms 482: learn: 0.3099569 total: 2.25s remaining: 79.1ms 483: learn: 0.3097728 total: 2.25s remaining: 74.5ms 484: learn: 0.3096936 total: 2.26s remaining: 69.8ms 485: learn: 0.3095276 total: 2.26s remaining: 65.2ms 486: learn: 0.3093053 total: 2.27s remaining: 60.5ms 487: learn: 0.3092215 total: 2.27s remaining: 55.9ms 488: learn: 0.3090390 total: 2.28s remaining: 51.2ms 489: learn: 0.3088935 total: 2.28s remaining: 46.6ms 490: learn: 0.3087968 total: 2.29s remaining: 41.9ms 491: learn: 0.3086630 total: 2.29s remaining: 37.2ms 492: learn: 0.3085718 total: 2.29s remaining: 32.6ms 493: learn: 0.3084535 total: 2.3s remaining: 27.9ms 494: learn: 0.3082034 total: 2.3s remaining: 23.3ms 495: learn: 0.3081250 total: 2.31s remaining: 18.6ms 496: learn: 0.3079642 total: 2.31s remaining: 14ms 497: learn: 0.3078798 total: 2.32s remaining: 9.3ms 498: learn: 0.3077689 total: 2.32s remaining: 4.65ms 499: learn: 0.3076681 total: 2.33s remaining: 0us
<catboost.core.CatBoostClassifier at 0x1bc35c2cfa0>
y_pred_prob = pd.DataFrame(cbc_mod_all.predict_proba(X)[:, 1])
fpr, tpr, best_threshold_ind, best_threshold = get_roc_curve(y, y_pred_prob)
print(best_threshold)
tpr = 0.8492462311557789 fpr = 0.11077636152954809 threshold = 0.5299397069432288 0.5299397069432288
param_grid = {
"learning_rate": [0.01, 0.05, 0.1, 0.2, 0.5],
"n_estimators": [50, 100, 150, 200, 250],
'max_depth' : [3, 5, 8, 12]#,
# 'min_samples_leaf' : [1, 3, 5, 8]
}
LGBMClassifier_gridsearch_hp_tuning = GridSearchCV(
LGBMClassifier(),
param_grid=param_grid,
# scoring="accuracy",
n_jobs=-1,
verbose = 2
)
LGBMClassifier_gridsearch_hp_tuning.fit(X_train, y_train)
print(LGBMClassifier_gridsearch_hp_tuning.best_params_)
print("best_score = ", LGBMClassifier_gridsearch_hp_tuning.best_score_)
Fitting 5 folds for each of 100 candidates, totalling 500 fits
{'learning_rate': 0.05, 'max_depth': 12, 'n_estimators': 150}
best_score = 0.8097503478166423
lgbmc_mod= LGBMClassifier(learning_rate = 0.05, max_depth = 12, n_estimators = 150)
lgbmc_mod.fit(X_train, y_train)
print('Acc Train =', round(lgbmc_mod.score(X_train, y_train) * 100, 2))
print('Acc Test =', round(lgbmc_mod.score(X_test, y_test) * 100, 2))
model_acc['LGBMClassifier'] = (round(lgbmc_mod.score(X_test, y_test) * 100, 2), lgbmc_mod)
Acc Train = 88.84 Acc Test = 81.08
y_test_pred_prob = pd.DataFrame(cbc_mod.predict_proba(X_test)[:, 1])
fpr, tpr, best_threshold_ind, best_threshold = get_roc_curve(y_test, y_test_pred_prob)
roc_curves['LGBMClassifier'] = (fpr, tpr, best_threshold_ind)
plot_roc_curve(fpr, tpr, best_threshold_ind)
tpr = 0.8397212543554007 fpr = 0.1947608200455581 threshold = 0.4969305625069996
y_test_pred = y_test_pred_prob.applymap(lambda x: 1 if x > best_threshold else 0)
y_test_pred
print('ACC with threshold', round(best_threshold, 3), '=', round(accuracy_score(y_test, y_test_pred) *100, 5))
ACC with threshold 0.497 = 82.17366
lgbmc_mod_all = CatBoostClassifier(n_estimators=100, learning_rate=0.05, max_depth=12) #var1
#{'learning_rate': 0.05, 'max_depth': 12, 'n_estimators': 100}
lgbmc_mod_all.fit(X, y)
0: learn: 0.6673417 total: 115ms remaining: 11.4s 1: learn: 0.6388096 total: 246ms remaining: 12s 2: learn: 0.6120298 total: 378ms remaining: 12.2s 3: learn: 0.5885028 total: 496ms remaining: 11.9s 4: learn: 0.5665156 total: 616ms remaining: 11.7s 5: learn: 0.5489143 total: 754ms remaining: 11.8s 6: learn: 0.5321396 total: 886ms remaining: 11.8s 7: learn: 0.5165535 total: 1.02s remaining: 11.7s 8: learn: 0.5066006 total: 1.15s remaining: 11.6s 9: learn: 0.4955892 total: 1.29s remaining: 11.6s 10: learn: 0.4854435 total: 1.41s remaining: 11.4s 11: learn: 0.4754368 total: 1.53s remaining: 11.2s 12: learn: 0.4654695 total: 1.65s remaining: 11s 13: learn: 0.4566338 total: 1.79s remaining: 11s 14: learn: 0.4508601 total: 1.92s remaining: 10.9s 15: learn: 0.4448802 total: 2.05s remaining: 10.8s 16: learn: 0.4381789 total: 2.19s remaining: 10.7s 17: learn: 0.4326503 total: 2.32s remaining: 10.6s 18: learn: 0.4285685 total: 2.44s remaining: 10.4s 19: learn: 0.4232935 total: 2.56s remaining: 10.2s 20: learn: 0.4184708 total: 2.69s remaining: 10.1s 21: learn: 0.4129457 total: 2.82s remaining: 9.99s 22: learn: 0.4087873 total: 2.95s remaining: 9.87s 23: learn: 0.4032844 total: 3.08s remaining: 9.77s 24: learn: 0.3990583 total: 3.22s remaining: 9.65s 25: learn: 0.3938664 total: 3.36s remaining: 9.55s 26: learn: 0.3909633 total: 3.48s remaining: 9.41s 27: learn: 0.3879497 total: 3.6s remaining: 9.27s 28: learn: 0.3838459 total: 3.73s remaining: 9.13s 29: learn: 0.3810334 total: 3.86s remaining: 9s 30: learn: 0.3778933 total: 3.98s remaining: 8.86s 31: learn: 0.3751706 total: 4.11s remaining: 8.73s 32: learn: 0.3721321 total: 4.23s remaining: 8.58s 33: learn: 0.3691462 total: 4.35s remaining: 8.44s 34: learn: 0.3666209 total: 4.47s remaining: 8.3s 35: learn: 0.3650299 total: 4.59s remaining: 8.15s 36: learn: 0.3631869 total: 4.71s remaining: 8.02s 37: learn: 0.3603522 total: 4.85s remaining: 7.92s 38: learn: 0.3586432 total: 4.97s remaining: 7.78s 39: learn: 0.3563908 total: 5.1s remaining: 7.65s 40: learn: 0.3542448 total: 5.22s remaining: 7.52s 41: learn: 0.3526462 total: 5.36s remaining: 7.4s 42: learn: 0.3500569 total: 5.49s remaining: 7.28s 43: learn: 0.3485598 total: 5.61s remaining: 7.14s 44: learn: 0.3464365 total: 5.73s remaining: 7s 45: learn: 0.3452918 total: 5.85s remaining: 6.87s 46: learn: 0.3442342 total: 5.97s remaining: 6.73s 47: learn: 0.3428338 total: 6.09s remaining: 6.6s 48: learn: 0.3412422 total: 6.21s remaining: 6.47s 49: learn: 0.3391832 total: 6.34s remaining: 6.34s 50: learn: 0.3362108 total: 6.46s remaining: 6.21s 51: learn: 0.3345792 total: 6.58s remaining: 6.08s 52: learn: 0.3336008 total: 6.7s remaining: 5.94s 53: learn: 0.3310233 total: 6.83s remaining: 5.82s 54: learn: 0.3286172 total: 6.97s remaining: 5.7s 55: learn: 0.3264156 total: 7.09s remaining: 5.57s 56: learn: 0.3241421 total: 7.22s remaining: 5.44s 57: learn: 0.3223466 total: 7.34s remaining: 5.32s 58: learn: 0.3205128 total: 7.46s remaining: 5.19s 59: learn: 0.3194264 total: 7.59s remaining: 5.06s 60: learn: 0.3173504 total: 7.71s remaining: 4.93s 61: learn: 0.3157690 total: 7.84s remaining: 4.8s 62: learn: 0.3150136 total: 7.97s remaining: 4.68s 63: learn: 0.3141026 total: 8.1s remaining: 4.55s 64: learn: 0.3132157 total: 8.22s remaining: 4.43s 65: learn: 0.3113453 total: 8.36s remaining: 4.3s 66: learn: 0.3098632 total: 8.47s remaining: 4.17s 67: learn: 0.3086519 total: 8.6s remaining: 4.04s 68: learn: 0.3079598 total: 8.72s remaining: 3.92s 69: learn: 0.3065707 total: 8.84s remaining: 3.79s 70: learn: 0.3053167 total: 8.95s remaining: 3.66s 71: learn: 0.3042642 total: 9.08s remaining: 3.53s 72: learn: 0.3030929 total: 9.21s remaining: 3.41s 73: learn: 0.3019901 total: 9.36s remaining: 3.29s 74: learn: 0.3006424 total: 9.47s remaining: 3.16s 75: learn: 0.3000320 total: 9.6s remaining: 3.03s 76: learn: 0.2990030 total: 9.73s remaining: 2.91s 77: learn: 0.2976731 total: 9.87s remaining: 2.78s 78: learn: 0.2967026 total: 9.99s remaining: 2.66s 79: learn: 0.2955392 total: 10.1s remaining: 2.53s 80: learn: 0.2949240 total: 10.3s remaining: 2.41s 81: learn: 0.2923005 total: 10.4s remaining: 2.28s 82: learn: 0.2907026 total: 10.5s remaining: 2.15s 83: learn: 0.2898413 total: 10.6s remaining: 2.03s 84: learn: 0.2884702 total: 10.8s remaining: 1.9s 85: learn: 0.2873515 total: 10.9s remaining: 1.77s 86: learn: 0.2862590 total: 11s remaining: 1.65s 87: learn: 0.2850284 total: 11.2s remaining: 1.52s 88: learn: 0.2839517 total: 11.3s remaining: 1.4s 89: learn: 0.2830780 total: 11.4s remaining: 1.27s 90: learn: 0.2819593 total: 11.5s remaining: 1.14s 91: learn: 0.2806480 total: 11.7s remaining: 1.01s 92: learn: 0.2801820 total: 11.8s remaining: 889ms 93: learn: 0.2786975 total: 11.9s remaining: 762ms 94: learn: 0.2774695 total: 12.1s remaining: 635ms 95: learn: 0.2767360 total: 12.2s remaining: 508ms 96: learn: 0.2759419 total: 12.3s remaining: 382ms 97: learn: 0.2746018 total: 12.5s remaining: 254ms 98: learn: 0.2733997 total: 12.6s remaining: 127ms 99: learn: 0.2727936 total: 12.7s remaining: 0us
<catboost.core.CatBoostClassifier at 0x1bc35c3cfd0>
for m in model_acc.keys():
print(m, model_acc[m][0])
LogisticRegression 78.32 DecisionTreeClassifier 78.55 RandomForestClassifier 80.85 GradientBoostingClassifier 81.14 SVC 79.64 KNeighborsClassifier 76.6 XGBClassifier 80.56 CatBoostClassifier 82.17 LGBMClassifier 81.08
fig, axs = plt.subplots(figsize=(15, 8))
colors= []
plt.plot([0, 1], [0, 1], linestyle='dashed', color='red')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC curve')
for model in roc_curves.keys():
plt.plot(roc_curves[model][0], roc_curves[model][1], label=model)
axs.legend()
roc_curves['DecisionTreeClassifier'][0]
array([0. , 0. , 0.0022779 , 0.00341686, 0.00455581,
0.00569476, 0.01252847, 0.02505695, 0.04214123, 0.05922551,
0.06036446, 0.07630979, 0.09339408, 0.12186788, 0.1309795 ,
0.14578588, 0.14920273, 0.18451025, 0.22209567, 0.26537585,
0.26537585, 0.27790433, 0.29157175, 0.3405467 , 0.34624146,
0.72665148, 0.77448747, 0.95216401, 0.97494305, 1. ])
# LGBMClassifier 81.08
# GradientBoostingClassifier 81.14
# CatBoostClassifier 82.17
# xgb_mod = XGBClassifier(n_estimators=250, learning_rate=0.1, max_depth=3)
# xgb_mod.fit(X_train, y_train)
y_test_pred_gb_mod = gb_mod.predict_proba(X_test)[:, 1]
y_test_pred_cbc_mod = cbc_mod.predict_proba(X_test)[:, 1]
y_test_pred_lgbmc_mod = lgbmc_mod.predict_proba(X_test)[:, 1]
y_test_pred_prob = pd.DataFrame((y_test_pred_gb_mod + y_test_pred_cbc_mod + y_test_pred_lgbmc_mod) / 3)
fpr, tpr, best_threshold_ind, best_threshold = get_roc_curve(y_test, y_test_pred_prob)
y_test_pred = y_test_pred_prob.applymap(lambda x: 1 if x > best_threshold else 0)
print('ACC with threshold', round(best_threshold, 3), '=', round(accuracy_score(y_test, y_test_pred) *100, 5))
tpr = 0.8315911730545877 fpr = 0.19248291571753987 threshold = 0.5081899358189764 ACC with threshold 0.508 = 81.88614
0: learn: 0.6717601 total: 11.8ms remaining: 5.91s 1: learn: 0.6559285 total: 20.4ms remaining: 5.07s 2: learn: 0.6367266 total: 25.1ms remaining: 4.16s 3: learn: 0.6199487 total: 29.5ms remaining: 3.66s 4: learn: 0.6064402 total: 33.7ms remaining: 3.33s 5: learn: 0.5930960 total: 38.3ms remaining: 3.15s 6: learn: 0.5824253 total: 42.8ms remaining: 3.02s 7: learn: 0.5713423 total: 47.4ms remaining: 2.92s 8: learn: 0.5628770 total: 51.9ms remaining: 2.83s 9: learn: 0.5510038 total: 56.5ms remaining: 2.77s 10: learn: 0.5445644 total: 60.6ms remaining: 2.69s 11: learn: 0.5371470 total: 65.1ms remaining: 2.65s 12: learn: 0.5298268 total: 69.7ms remaining: 2.61s 13: learn: 0.5214459 total: 74.3ms remaining: 2.58s 14: learn: 0.5153465 total: 78.8ms remaining: 2.55s 15: learn: 0.5113514 total: 83.4ms remaining: 2.52s 16: learn: 0.5073899 total: 87.9ms remaining: 2.5s 17: learn: 0.5030747 total: 92.5ms remaining: 2.48s 18: learn: 0.4994706 total: 97ms remaining: 2.46s 19: learn: 0.4945849 total: 102ms remaining: 2.44s 20: learn: 0.4913744 total: 106ms remaining: 2.43s 21: learn: 0.4887091 total: 111ms remaining: 2.41s 22: learn: 0.4857981 total: 115ms remaining: 2.39s 23: learn: 0.4825017 total: 120ms remaining: 2.38s 24: learn: 0.4786623 total: 125ms remaining: 2.37s 25: learn: 0.4768492 total: 129ms remaining: 2.36s 26: learn: 0.4734889 total: 134ms remaining: 2.35s 27: learn: 0.4710118 total: 139ms remaining: 2.34s 28: learn: 0.4684620 total: 144ms remaining: 2.33s 29: learn: 0.4662380 total: 148ms remaining: 2.32s 30: learn: 0.4646370 total: 153ms remaining: 2.31s 31: learn: 0.4631152 total: 158ms remaining: 2.31s 32: learn: 0.4609342 total: 162ms remaining: 2.3s 33: learn: 0.4592114 total: 167ms remaining: 2.29s 34: learn: 0.4569889 total: 172ms remaining: 2.28s 35: learn: 0.4545138 total: 177ms remaining: 2.27s 36: learn: 0.4531351 total: 181ms remaining: 2.27s 37: learn: 0.4511333 total: 186ms remaining: 2.26s 38: learn: 0.4500232 total: 191ms remaining: 2.26s 39: learn: 0.4485887 total: 196ms remaining: 2.25s 40: learn: 0.4469055 total: 201ms remaining: 2.25s 41: learn: 0.4453079 total: 206ms remaining: 2.24s 42: learn: 0.4442801 total: 210ms remaining: 2.24s 43: learn: 0.4430573 total: 215ms remaining: 2.23s 44: learn: 0.4417083 total: 220ms remaining: 2.22s 45: learn: 0.4399265 total: 225ms remaining: 2.22s 46: learn: 0.4383401 total: 230ms remaining: 2.21s 47: learn: 0.4373329 total: 235ms remaining: 2.21s 48: learn: 0.4364893 total: 240ms remaining: 2.21s 49: learn: 0.4356206 total: 245ms remaining: 2.21s 50: learn: 0.4342541 total: 250ms remaining: 2.2s 51: learn: 0.4335778 total: 255ms remaining: 2.19s 52: learn: 0.4320319 total: 259ms remaining: 2.19s 53: learn: 0.4309256 total: 264ms remaining: 2.18s 54: learn: 0.4299535 total: 268ms remaining: 2.17s 55: learn: 0.4285534 total: 273ms remaining: 2.17s 56: learn: 0.4275038 total: 278ms remaining: 2.16s 57: learn: 0.4263364 total: 282ms remaining: 2.15s 58: learn: 0.4256917 total: 287ms remaining: 2.14s 59: learn: 0.4252728 total: 291ms remaining: 2.14s 60: learn: 0.4249211 total: 296ms remaining: 2.13s 61: learn: 0.4242374 total: 301ms remaining: 2.12s 62: learn: 0.4231913 total: 305ms remaining: 2.12s 63: learn: 0.4224360 total: 310ms remaining: 2.11s 64: learn: 0.4217434 total: 314ms remaining: 2.1s 65: learn: 0.4211012 total: 319ms remaining: 2.1s 66: learn: 0.4201431 total: 323ms remaining: 2.09s 67: learn: 0.4187252 total: 328ms remaining: 2.08s 68: learn: 0.4175828 total: 333ms remaining: 2.08s 69: learn: 0.4167893 total: 337ms remaining: 2.07s 70: learn: 0.4160514 total: 343ms remaining: 2.07s 71: learn: 0.4152627 total: 347ms remaining: 2.06s 72: learn: 0.4145157 total: 352ms remaining: 2.06s 73: learn: 0.4136453 total: 357ms remaining: 2.05s 74: learn: 0.4125564 total: 362ms remaining: 2.05s 75: learn: 0.4116931 total: 366ms remaining: 2.04s 76: learn: 0.4110673 total: 373ms remaining: 2.05s 77: learn: 0.4106763 total: 377ms remaining: 2.04s 78: learn: 0.4099986 total: 382ms remaining: 2.04s 79: learn: 0.4095265 total: 387ms remaining: 2.03s 80: learn: 0.4089138 total: 392ms remaining: 2.03s 81: learn: 0.4084111 total: 396ms remaining: 2.02s 82: learn: 0.4074390 total: 401ms remaining: 2.01s 83: learn: 0.4069154 total: 406ms remaining: 2.01s 84: learn: 0.4064620 total: 411ms remaining: 2s 85: learn: 0.4059731 total: 416ms remaining: 2s 86: learn: 0.4057280 total: 421ms remaining: 2s 87: learn: 0.4052259 total: 426ms remaining: 1.99s 88: learn: 0.4046340 total: 431ms remaining: 1.99s 89: learn: 0.4039782 total: 435ms remaining: 1.98s 90: learn: 0.4032069 total: 440ms remaining: 1.98s 91: learn: 0.4026369 total: 445ms remaining: 1.97s 92: learn: 0.4019732 total: 449ms remaining: 1.97s 93: learn: 0.4015387 total: 454ms remaining: 1.96s 94: learn: 0.4009854 total: 459ms remaining: 1.96s 95: learn: 0.4006115 total: 464ms remaining: 1.95s 96: learn: 0.3994422 total: 469ms remaining: 1.95s 97: learn: 0.3987352 total: 474ms remaining: 1.94s 98: learn: 0.3982851 total: 479ms remaining: 1.94s 99: learn: 0.3979062 total: 483ms remaining: 1.93s 100: learn: 0.3974923 total: 488ms remaining: 1.93s 101: learn: 0.3972004 total: 493ms remaining: 1.92s 102: learn: 0.3968142 total: 497ms remaining: 1.92s 103: learn: 0.3965742 total: 502ms remaining: 1.91s 104: learn: 0.3963670 total: 507ms remaining: 1.91s 105: learn: 0.3960710 total: 512ms remaining: 1.9s 106: learn: 0.3955506 total: 517ms remaining: 1.9s 107: learn: 0.3949300 total: 521ms remaining: 1.89s 108: learn: 0.3943364 total: 526ms remaining: 1.89s 109: learn: 0.3937671 total: 530ms remaining: 1.88s 110: learn: 0.3935486 total: 535ms remaining: 1.88s 111: learn: 0.3931848 total: 540ms remaining: 1.87s 112: learn: 0.3927611 total: 544ms remaining: 1.86s 113: learn: 0.3924328 total: 549ms remaining: 1.86s 114: learn: 0.3920901 total: 554ms remaining: 1.85s 115: learn: 0.3915730 total: 559ms remaining: 1.85s 116: learn: 0.3912976 total: 564ms remaining: 1.84s 117: learn: 0.3909855 total: 569ms remaining: 1.84s 118: learn: 0.3906130 total: 573ms remaining: 1.83s 119: learn: 0.3903298 total: 578ms remaining: 1.83s 120: learn: 0.3900764 total: 583ms remaining: 1.82s 121: learn: 0.3898187 total: 588ms remaining: 1.82s 122: learn: 0.3894357 total: 593ms remaining: 1.82s 123: learn: 0.3889836 total: 598ms remaining: 1.81s 124: learn: 0.3885785 total: 603ms remaining: 1.81s 125: learn: 0.3882237 total: 607ms remaining: 1.8s 126: learn: 0.3880068 total: 612ms remaining: 1.8s 127: learn: 0.3877897 total: 617ms remaining: 1.79s 128: learn: 0.3875918 total: 621ms remaining: 1.79s 129: learn: 0.3869848 total: 626ms remaining: 1.78s 130: learn: 0.3864431 total: 630ms remaining: 1.77s 131: learn: 0.3862427 total: 635ms remaining: 1.77s 132: learn: 0.3860463 total: 640ms remaining: 1.77s 133: learn: 0.3857634 total: 645ms remaining: 1.76s 134: learn: 0.3855350 total: 650ms remaining: 1.76s 135: learn: 0.3852828 total: 655ms remaining: 1.75s 136: learn: 0.3850812 total: 660ms remaining: 1.75s 137: learn: 0.3848158 total: 664ms remaining: 1.74s 138: learn: 0.3845824 total: 669ms remaining: 1.74s 139: learn: 0.3844295 total: 673ms remaining: 1.73s 140: learn: 0.3838684 total: 678ms remaining: 1.72s 141: learn: 0.3835324 total: 682ms remaining: 1.72s 142: learn: 0.3832876 total: 686ms remaining: 1.71s 143: learn: 0.3830584 total: 691ms remaining: 1.71s 144: learn: 0.3827970 total: 695ms remaining: 1.7s 145: learn: 0.3826102 total: 700ms remaining: 1.7s 146: learn: 0.3824430 total: 704ms remaining: 1.69s 147: learn: 0.3822307 total: 709ms remaining: 1.69s 148: learn: 0.3818330 total: 714ms remaining: 1.68s 149: learn: 0.3816120 total: 718ms remaining: 1.68s 150: learn: 0.3814018 total: 722ms remaining: 1.67s 151: learn: 0.3809961 total: 727ms remaining: 1.66s 152: learn: 0.3802546 total: 731ms remaining: 1.66s 153: learn: 0.3797225 total: 736ms remaining: 1.65s 154: learn: 0.3794987 total: 741ms remaining: 1.65s 155: learn: 0.3791479 total: 746ms remaining: 1.64s 156: learn: 0.3789332 total: 751ms remaining: 1.64s 157: learn: 0.3787225 total: 755ms remaining: 1.64s 158: learn: 0.3785415 total: 760ms remaining: 1.63s 159: learn: 0.3782864 total: 765ms remaining: 1.63s 160: learn: 0.3780342 total: 769ms remaining: 1.62s 161: learn: 0.3778526 total: 774ms remaining: 1.61s 162: learn: 0.3776400 total: 779ms remaining: 1.61s 163: learn: 0.3773102 total: 784ms remaining: 1.61s 164: learn: 0.3769944 total: 789ms remaining: 1.6s 165: learn: 0.3765577 total: 794ms remaining: 1.6s 166: learn: 0.3763370 total: 798ms remaining: 1.59s 167: learn: 0.3759730 total: 803ms remaining: 1.59s 168: learn: 0.3756746 total: 808ms remaining: 1.58s 169: learn: 0.3754188 total: 813ms remaining: 1.58s 170: learn: 0.3752272 total: 817ms remaining: 1.57s 171: learn: 0.3750105 total: 822ms remaining: 1.57s 172: learn: 0.3747266 total: 826ms remaining: 1.56s 173: learn: 0.3743836 total: 831ms remaining: 1.56s 174: learn: 0.3741939 total: 835ms remaining: 1.55s 175: learn: 0.3739720 total: 840ms remaining: 1.55s 176: learn: 0.3737801 total: 845ms remaining: 1.54s 177: learn: 0.3734711 total: 849ms remaining: 1.54s 178: learn: 0.3732889 total: 854ms remaining: 1.53s 179: learn: 0.3726123 total: 859ms remaining: 1.53s 180: learn: 0.3723271 total: 864ms remaining: 1.52s 181: learn: 0.3721449 total: 868ms remaining: 1.52s 182: learn: 0.3718879 total: 873ms remaining: 1.51s 183: learn: 0.3716904 total: 877ms remaining: 1.51s 184: learn: 0.3714175 total: 882ms remaining: 1.5s 185: learn: 0.3711286 total: 887ms remaining: 1.5s 186: learn: 0.3706215 total: 891ms remaining: 1.49s 187: learn: 0.3703787 total: 895ms remaining: 1.49s 188: learn: 0.3700724 total: 900ms remaining: 1.48s 189: learn: 0.3691602 total: 905ms remaining: 1.48s 190: learn: 0.3689259 total: 909ms remaining: 1.47s 191: learn: 0.3684667 total: 914ms remaining: 1.47s 192: learn: 0.3682771 total: 918ms remaining: 1.46s 193: learn: 0.3681055 total: 923ms remaining: 1.46s 194: learn: 0.3678036 total: 928ms remaining: 1.45s 195: learn: 0.3676447 total: 933ms remaining: 1.45s 196: learn: 0.3674059 total: 937ms remaining: 1.44s 197: learn: 0.3671371 total: 942ms remaining: 1.44s 198: learn: 0.3668385 total: 947ms remaining: 1.43s 199: learn: 0.3666176 total: 952ms remaining: 1.43s 200: learn: 0.3664413 total: 956ms remaining: 1.42s 201: learn: 0.3661534 total: 961ms remaining: 1.42s 202: learn: 0.3658395 total: 965ms remaining: 1.41s 203: learn: 0.3653964 total: 970ms remaining: 1.41s 204: learn: 0.3652394 total: 975ms remaining: 1.4s 205: learn: 0.3650271 total: 980ms remaining: 1.4s 206: learn: 0.3648086 total: 984ms remaining: 1.39s 207: learn: 0.3645508 total: 989ms remaining: 1.39s 208: learn: 0.3642864 total: 993ms remaining: 1.38s 209: learn: 0.3639472 total: 998ms remaining: 1.38s 210: learn: 0.3635741 total: 1s remaining: 1.37s 211: learn: 0.3631929 total: 1.01s remaining: 1.37s 212: learn: 0.3627399 total: 1.01s remaining: 1.36s 213: learn: 0.3624739 total: 1.01s remaining: 1.36s 214: learn: 0.3622297 total: 1.02s remaining: 1.35s 215: learn: 0.3620023 total: 1.02s remaining: 1.35s 216: learn: 0.3618196 total: 1.03s remaining: 1.34s 217: learn: 0.3614155 total: 1.03s remaining: 1.34s 218: learn: 0.3609358 total: 1.04s remaining: 1.33s 219: learn: 0.3607155 total: 1.04s remaining: 1.33s 220: learn: 0.3604386 total: 1.05s remaining: 1.32s 221: learn: 0.3601874 total: 1.05s remaining: 1.32s 222: learn: 0.3598694 total: 1.06s remaining: 1.31s 223: learn: 0.3596441 total: 1.06s remaining: 1.31s 224: learn: 0.3594202 total: 1.06s remaining: 1.3s 225: learn: 0.3590983 total: 1.07s remaining: 1.3s 226: learn: 0.3588792 total: 1.07s remaining: 1.29s 227: learn: 0.3586156 total: 1.08s remaining: 1.29s 228: learn: 0.3583252 total: 1.08s remaining: 1.28s 229: learn: 0.3580778 total: 1.09s remaining: 1.28s 230: learn: 0.3577989 total: 1.09s remaining: 1.27s 231: learn: 0.3575176 total: 1.1s remaining: 1.27s 232: learn: 0.3572006 total: 1.1s remaining: 1.26s 233: learn: 0.3568898 total: 1.1s remaining: 1.26s 234: learn: 0.3566845 total: 1.11s remaining: 1.25s 235: learn: 0.3563817 total: 1.11s remaining: 1.25s 236: learn: 0.3560629 total: 1.12s remaining: 1.24s 237: learn: 0.3558409 total: 1.12s remaining: 1.24s 238: learn: 0.3555551 total: 1.13s remaining: 1.23s 239: learn: 0.3552568 total: 1.13s remaining: 1.23s 240: learn: 0.3548932 total: 1.14s remaining: 1.22s 241: learn: 0.3546428 total: 1.14s remaining: 1.22s 242: learn: 0.3544089 total: 1.15s remaining: 1.21s 243: learn: 0.3540873 total: 1.15s remaining: 1.21s 244: learn: 0.3538193 total: 1.16s remaining: 1.2s 245: learn: 0.3534859 total: 1.16s remaining: 1.2s 246: learn: 0.3530990 total: 1.17s remaining: 1.2s 247: learn: 0.3528541 total: 1.17s remaining: 1.19s 248: learn: 0.3524647 total: 1.18s remaining: 1.19s 249: learn: 0.3522479 total: 1.18s remaining: 1.18s 250: learn: 0.3520987 total: 1.18s remaining: 1.18s 251: learn: 0.3518770 total: 1.19s remaining: 1.17s 252: learn: 0.3516042 total: 1.19s remaining: 1.17s 253: learn: 0.3514098 total: 1.2s remaining: 1.16s 254: learn: 0.3511992 total: 1.2s remaining: 1.16s 255: learn: 0.3509905 total: 1.21s remaining: 1.15s 256: learn: 0.3507944 total: 1.21s remaining: 1.15s 257: learn: 0.3505689 total: 1.22s remaining: 1.14s 258: learn: 0.3503269 total: 1.22s remaining: 1.14s 259: learn: 0.3501707 total: 1.23s remaining: 1.13s 260: learn: 0.3499277 total: 1.23s remaining: 1.13s 261: learn: 0.3493997 total: 1.23s remaining: 1.12s 262: learn: 0.3491493 total: 1.24s remaining: 1.12s 263: learn: 0.3489810 total: 1.24s remaining: 1.11s 264: learn: 0.3488104 total: 1.25s remaining: 1.11s 265: learn: 0.3485737 total: 1.25s remaining: 1.1s 266: learn: 0.3483035 total: 1.26s remaining: 1.1s 267: learn: 0.3481858 total: 1.26s remaining: 1.09s 268: learn: 0.3479382 total: 1.26s remaining: 1.09s 269: learn: 0.3477118 total: 1.27s remaining: 1.08s 270: learn: 0.3474673 total: 1.27s remaining: 1.08s 271: learn: 0.3472235 total: 1.28s remaining: 1.07s 272: learn: 0.3470084 total: 1.28s remaining: 1.07s 273: learn: 0.3468528 total: 1.29s remaining: 1.06s 274: learn: 0.3466809 total: 1.29s remaining: 1.06s 275: learn: 0.3464293 total: 1.3s remaining: 1.05s 276: learn: 0.3461370 total: 1.3s remaining: 1.05s 277: learn: 0.3459567 total: 1.31s remaining: 1.04s 278: learn: 0.3457771 total: 1.31s remaining: 1.04s 279: learn: 0.3455924 total: 1.32s remaining: 1.03s 280: learn: 0.3453867 total: 1.32s remaining: 1.03s 281: learn: 0.3450915 total: 1.32s remaining: 1.02s 282: learn: 0.3448678 total: 1.33s remaining: 1.02s 283: learn: 0.3445951 total: 1.33s remaining: 1.01s 284: learn: 0.3444689 total: 1.34s remaining: 1.01s 285: learn: 0.3441973 total: 1.34s remaining: 1.01s 286: learn: 0.3439830 total: 1.35s remaining: 1s 287: learn: 0.3438204 total: 1.35s remaining: 997ms 288: learn: 0.3436561 total: 1.36s remaining: 992ms 289: learn: 0.3434784 total: 1.36s remaining: 987ms 290: learn: 0.3432609 total: 1.37s remaining: 983ms 291: learn: 0.3430424 total: 1.37s remaining: 978ms 292: learn: 0.3428841 total: 1.38s remaining: 973ms 293: learn: 0.3425600 total: 1.38s remaining: 968ms 294: learn: 0.3424239 total: 1.39s remaining: 963ms 295: learn: 0.3422246 total: 1.39s remaining: 958ms 296: learn: 0.3420495 total: 1.39s remaining: 953ms 297: learn: 0.3417844 total: 1.4s remaining: 949ms 298: learn: 0.3416426 total: 1.4s remaining: 944ms 299: learn: 0.3413840 total: 1.41s remaining: 939ms 300: learn: 0.3411010 total: 1.41s remaining: 934ms 301: learn: 0.3407767 total: 1.42s remaining: 929ms 302: learn: 0.3405628 total: 1.42s remaining: 925ms 303: learn: 0.3403627 total: 1.43s remaining: 920ms 304: learn: 0.3401680 total: 1.43s remaining: 915ms 305: learn: 0.3400366 total: 1.44s remaining: 910ms 306: learn: 0.3398456 total: 1.44s remaining: 905ms 307: learn: 0.3396233 total: 1.44s remaining: 900ms 308: learn: 0.3393664 total: 1.45s remaining: 896ms 309: learn: 0.3390886 total: 1.45s remaining: 891ms 310: learn: 0.3389059 total: 1.46s remaining: 886ms 311: learn: 0.3386825 total: 1.46s remaining: 881ms 312: learn: 0.3384965 total: 1.47s remaining: 876ms 313: learn: 0.3382500 total: 1.47s remaining: 872ms 314: learn: 0.3381051 total: 1.48s remaining: 867ms 315: learn: 0.3378937 total: 1.48s remaining: 862ms 316: learn: 0.3376916 total: 1.48s remaining: 857ms 317: learn: 0.3375115 total: 1.49s remaining: 852ms 318: learn: 0.3373210 total: 1.49s remaining: 848ms 319: learn: 0.3371033 total: 1.5s remaining: 843ms 320: learn: 0.3369182 total: 1.5s remaining: 839ms 321: learn: 0.3367748 total: 1.51s remaining: 834ms 322: learn: 0.3366565 total: 1.51s remaining: 829ms 323: learn: 0.3365268 total: 1.52s remaining: 824ms 324: learn: 0.3364213 total: 1.52s remaining: 820ms 325: learn: 0.3362802 total: 1.53s remaining: 815ms 326: learn: 0.3359843 total: 1.53s remaining: 811ms 327: learn: 0.3357722 total: 1.54s remaining: 806ms 328: learn: 0.3355504 total: 1.54s remaining: 801ms 329: learn: 0.3353803 total: 1.54s remaining: 796ms 330: learn: 0.3352275 total: 1.55s remaining: 791ms 331: learn: 0.3350715 total: 1.55s remaining: 787ms 332: learn: 0.3348467 total: 1.56s remaining: 782ms 333: learn: 0.3346739 total: 1.56s remaining: 777ms 334: learn: 0.3344955 total: 1.57s remaining: 772ms 335: learn: 0.3343182 total: 1.57s remaining: 768ms 336: learn: 0.3341957 total: 1.58s remaining: 763ms 337: learn: 0.3339743 total: 1.58s remaining: 758ms 338: learn: 0.3337593 total: 1.58s remaining: 753ms 339: learn: 0.3336233 total: 1.59s remaining: 748ms 340: learn: 0.3334741 total: 1.59s remaining: 744ms 341: learn: 0.3332315 total: 1.6s remaining: 739ms 342: learn: 0.3330434 total: 1.6s remaining: 734ms 343: learn: 0.3328270 total: 1.61s remaining: 729ms 344: learn: 0.3327191 total: 1.61s remaining: 725ms 345: learn: 0.3324520 total: 1.62s remaining: 720ms 346: learn: 0.3322861 total: 1.62s remaining: 715ms 347: learn: 0.3321402 total: 1.63s remaining: 710ms 348: learn: 0.3318896 total: 1.63s remaining: 706ms 349: learn: 0.3317607 total: 1.64s remaining: 701ms 350: learn: 0.3315616 total: 1.64s remaining: 696ms 351: learn: 0.3314236 total: 1.64s remaining: 691ms 352: learn: 0.3312920 total: 1.65s remaining: 686ms 353: learn: 0.3310145 total: 1.65s remaining: 682ms 354: learn: 0.3307983 total: 1.66s remaining: 677ms 355: learn: 0.3306109 total: 1.66s remaining: 672ms 356: learn: 0.3303871 total: 1.67s remaining: 667ms 357: learn: 0.3302206 total: 1.67s remaining: 663ms 358: learn: 0.3300003 total: 1.68s remaining: 658ms 359: learn: 0.3298057 total: 1.68s remaining: 653ms 360: learn: 0.3296643 total: 1.68s remaining: 648ms 361: learn: 0.3294424 total: 1.69s remaining: 644ms 362: learn: 0.3292544 total: 1.69s remaining: 639ms 363: learn: 0.3290123 total: 1.7s remaining: 635ms 364: learn: 0.3287291 total: 1.7s remaining: 630ms 365: learn: 0.3285776 total: 1.71s remaining: 625ms 366: learn: 0.3284752 total: 1.71s remaining: 621ms 367: learn: 0.3282392 total: 1.72s remaining: 616ms 368: learn: 0.3280258 total: 1.72s remaining: 611ms 369: learn: 0.3278825 total: 1.73s remaining: 607ms 370: learn: 0.3276901 total: 1.73s remaining: 602ms 371: learn: 0.3275092 total: 1.74s remaining: 598ms 372: learn: 0.3273203 total: 1.74s remaining: 593ms 373: learn: 0.3271396 total: 1.75s remaining: 588ms 374: learn: 0.3269237 total: 1.75s remaining: 583ms 375: learn: 0.3267089 total: 1.75s remaining: 579ms 376: learn: 0.3265159 total: 1.76s remaining: 574ms 377: learn: 0.3263222 total: 1.76s remaining: 569ms 378: learn: 0.3261838 total: 1.77s remaining: 565ms 379: learn: 0.3260458 total: 1.77s remaining: 560ms 380: learn: 0.3258971 total: 1.78s remaining: 555ms 381: learn: 0.3256985 total: 1.78s remaining: 551ms 382: learn: 0.3255459 total: 1.79s remaining: 546ms 383: learn: 0.3253767 total: 1.79s remaining: 541ms 384: learn: 0.3252537 total: 1.79s remaining: 536ms 385: learn: 0.3250408 total: 1.8s remaining: 532ms 386: learn: 0.3248237 total: 1.8s remaining: 527ms 387: learn: 0.3246429 total: 1.81s remaining: 522ms 388: learn: 0.3245438 total: 1.81s remaining: 517ms 389: learn: 0.3244019 total: 1.82s remaining: 513ms 390: learn: 0.3242587 total: 1.82s remaining: 508ms 391: learn: 0.3241970 total: 1.83s remaining: 503ms 392: learn: 0.3240150 total: 1.83s remaining: 499ms 393: learn: 0.3238193 total: 1.83s remaining: 494ms 394: learn: 0.3236717 total: 1.84s remaining: 489ms 395: learn: 0.3234936 total: 1.84s remaining: 484ms 396: learn: 0.3232674 total: 1.85s remaining: 480ms 397: learn: 0.3230962 total: 1.85s remaining: 475ms 398: learn: 0.3229725 total: 1.86s remaining: 470ms 399: learn: 0.3228008 total: 1.86s remaining: 466ms 400: learn: 0.3226170 total: 1.87s remaining: 461ms 401: learn: 0.3224791 total: 1.87s remaining: 456ms 402: learn: 0.3222931 total: 1.88s remaining: 452ms 403: learn: 0.3220041 total: 1.88s remaining: 447ms 404: learn: 0.3218080 total: 1.89s remaining: 442ms 405: learn: 0.3215786 total: 1.89s remaining: 438ms 406: learn: 0.3213293 total: 1.9s remaining: 433ms 407: learn: 0.3211666 total: 1.9s remaining: 429ms 408: learn: 0.3209909 total: 1.91s remaining: 424ms 409: learn: 0.3209621 total: 1.91s remaining: 419ms 410: learn: 0.3208176 total: 1.92s remaining: 415ms 411: learn: 0.3207185 total: 1.92s remaining: 410ms 412: learn: 0.3205567 total: 1.92s remaining: 405ms 413: learn: 0.3204339 total: 1.93s remaining: 401ms 414: learn: 0.3202224 total: 1.93s remaining: 396ms 415: learn: 0.3201699 total: 1.94s remaining: 391ms 416: learn: 0.3200657 total: 1.94s remaining: 387ms 417: learn: 0.3199241 total: 1.95s remaining: 382ms 418: learn: 0.3197214 total: 1.95s remaining: 377ms 419: learn: 0.3195637 total: 1.96s remaining: 373ms 420: learn: 0.3194919 total: 1.96s remaining: 368ms 421: learn: 0.3193358 total: 1.96s remaining: 363ms 422: learn: 0.3191969 total: 1.97s remaining: 358ms 423: learn: 0.3190577 total: 1.97s remaining: 354ms 424: learn: 0.3189382 total: 1.98s remaining: 349ms 425: learn: 0.3187837 total: 1.98s remaining: 344ms 426: learn: 0.3186229 total: 1.99s remaining: 340ms 427: learn: 0.3184402 total: 1.99s remaining: 335ms 428: learn: 0.3181934 total: 2s remaining: 330ms 429: learn: 0.3179372 total: 2s remaining: 326ms 430: learn: 0.3177930 total: 2s remaining: 321ms 431: learn: 0.3176247 total: 2.01s remaining: 316ms 432: learn: 0.3174426 total: 2.01s remaining: 312ms 433: learn: 0.3172850 total: 2.02s remaining: 307ms 434: learn: 0.3171810 total: 2.02s remaining: 302ms 435: learn: 0.3169647 total: 2.03s remaining: 298ms 436: learn: 0.3167284 total: 2.03s remaining: 293ms 437: learn: 0.3166068 total: 2.04s remaining: 288ms 438: learn: 0.3164993 total: 2.04s remaining: 284ms 439: learn: 0.3163399 total: 2.05s remaining: 279ms 440: learn: 0.3162061 total: 2.05s remaining: 274ms 441: learn: 0.3160192 total: 2.06s remaining: 270ms 442: learn: 0.3158436 total: 2.06s remaining: 265ms 443: learn: 0.3156906 total: 2.06s remaining: 260ms 444: learn: 0.3155182 total: 2.07s remaining: 256ms 445: learn: 0.3154327 total: 2.07s remaining: 251ms 446: learn: 0.3152776 total: 2.08s remaining: 247ms 447: learn: 0.3151075 total: 2.08s remaining: 242ms 448: learn: 0.3150200 total: 2.09s remaining: 237ms 449: learn: 0.3148317 total: 2.09s remaining: 233ms 450: learn: 0.3146968 total: 2.1s remaining: 228ms 451: learn: 0.3145142 total: 2.1s remaining: 223ms 452: learn: 0.3142551 total: 2.11s remaining: 219ms 453: learn: 0.3141559 total: 2.11s remaining: 214ms 454: learn: 0.3140553 total: 2.12s remaining: 209ms 455: learn: 0.3139161 total: 2.12s remaining: 205ms 456: learn: 0.3137254 total: 2.12s remaining: 200ms 457: learn: 0.3135849 total: 2.13s remaining: 195ms 458: learn: 0.3134253 total: 2.13s remaining: 191ms 459: learn: 0.3132513 total: 2.14s remaining: 186ms 460: learn: 0.3131640 total: 2.14s remaining: 181ms 461: learn: 0.3130146 total: 2.15s remaining: 177ms 462: learn: 0.3129096 total: 2.15s remaining: 172ms 463: learn: 0.3128294 total: 2.16s remaining: 167ms 464: learn: 0.3126863 total: 2.16s remaining: 163ms 465: learn: 0.3124912 total: 2.17s remaining: 158ms 466: learn: 0.3123493 total: 2.17s remaining: 153ms 467: learn: 0.3121929 total: 2.17s remaining: 149ms 468: learn: 0.3119701 total: 2.18s remaining: 144ms 469: learn: 0.3118240 total: 2.18s remaining: 139ms 470: learn: 0.3116671 total: 2.19s remaining: 135ms 471: learn: 0.3116504 total: 2.19s remaining: 130ms 472: learn: 0.3114849 total: 2.2s remaining: 125ms 473: learn: 0.3113597 total: 2.2s remaining: 121ms 474: learn: 0.3111754 total: 2.21s remaining: 116ms 475: learn: 0.3110146 total: 2.21s remaining: 111ms 476: learn: 0.3108544 total: 2.21s remaining: 107ms 477: learn: 0.3107609 total: 2.22s remaining: 102ms 478: learn: 0.3105628 total: 2.22s remaining: 97.5ms 479: learn: 0.3104318 total: 2.23s remaining: 92.9ms 480: learn: 0.3102733 total: 2.23s remaining: 88.2ms 481: learn: 0.3100673 total: 2.24s remaining: 83.6ms 482: learn: 0.3099569 total: 2.24s remaining: 78.9ms 483: learn: 0.3097728 total: 2.25s remaining: 74.3ms 484: learn: 0.3096936 total: 2.25s remaining: 69.7ms 485: learn: 0.3095276 total: 2.26s remaining: 65ms 486: learn: 0.3093053 total: 2.26s remaining: 60.4ms 487: learn: 0.3092215 total: 2.27s remaining: 55.7ms 488: learn: 0.3090390 total: 2.27s remaining: 51.1ms 489: learn: 0.3088935 total: 2.28s remaining: 46.5ms 490: learn: 0.3087968 total: 2.28s remaining: 41.8ms 491: learn: 0.3086630 total: 2.29s remaining: 37.2ms 492: learn: 0.3085718 total: 2.29s remaining: 32.5ms 493: learn: 0.3084535 total: 2.29s remaining: 27.9ms 494: learn: 0.3082034 total: 2.3s remaining: 23.2ms 495: learn: 0.3081250 total: 2.3s remaining: 18.6ms 496: learn: 0.3079642 total: 2.31s remaining: 13.9ms 497: learn: 0.3078798 total: 2.31s remaining: 9.29ms 498: learn: 0.3077689 total: 2.32s remaining: 4.64ms 499: learn: 0.3076681 total: 2.32s remaining: 0us
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-199-c32dba8c5208> in <module> 6 7 lgbmc_mod_all= LGBMClassifier(learning_rate = 0.05, max_depth = 12, n_estimators = 150) ----> 8 lgbmc_mod_all.fit(X) TypeError: fit() missing 1 required positional argument: 'y'
validation_df = pd.read_csv("test.csv", sep=',', engine='python')
validation_df
| PassengerId | HomePlanet | CryoSleep | Cabin | Destination | Age | VIP | RoomService | FoodCourt | ShoppingMall | Spa | VRDeck | Name | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0013_01 | Earth | True | G/3/S | TRAPPIST-1e | 27.0 | False | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | Nelly Carsoning |
| 1 | 0018_01 | Earth | False | F/4/S | TRAPPIST-1e | 19.0 | False | 0.0 | 9.0 | 0.0 | 2823.0 | 0.0 | Lerome Peckers |
| 2 | 0019_01 | Europa | True | C/0/S | 55 Cancri e | 31.0 | False | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | Sabih Unhearfus |
| 3 | 0021_01 | Europa | False | C/1/S | TRAPPIST-1e | 38.0 | False | 0.0 | 6652.0 | 0.0 | 181.0 | 585.0 | Meratz Caltilter |
| 4 | 0023_01 | Earth | False | F/5/S | TRAPPIST-1e | 20.0 | False | 10.0 | 0.0 | 635.0 | 0.0 | 0.0 | Brence Harperez |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4272 | 9266_02 | Earth | True | G/1496/S | TRAPPIST-1e | 34.0 | False | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | Jeron Peter |
| 4273 | 9269_01 | Earth | False | NaN | TRAPPIST-1e | 42.0 | False | 0.0 | 847.0 | 17.0 | 10.0 | 144.0 | Matty Scheron |
| 4274 | 9271_01 | Mars | True | D/296/P | 55 Cancri e | NaN | False | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | Jayrin Pore |
| 4275 | 9273_01 | Europa | False | D/297/P | NaN | NaN | False | 0.0 | 2680.0 | 0.0 | 0.0 | 523.0 | Kitakan Conale |
| 4276 | 9277_01 | Earth | True | G/1498/S | PSO J318.5-22 | 43.0 | False | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | Lilace Leonzaley |
4277 rows × 13 columns
passengerId_df = validation_df.loc[:, ['PassengerId']]
validation_df, am, em = data_transform(validation_df, for_visual_df, age_medians_all, expenses_means_all)
validation_df = encode_data(validation_df, nominal_encoder_all, scaler_all)
validation_df = memory_optimiz(validation_df)
validation_df
------ HomePlanet_update ------------ HomePlanet NULLs: 87 1. (after replacement through the GroupId) HomePlanet NULLs: 87 2. (after replacement through the Deck) HomePlanet NULLs: 51 3. (after replacement through the LastName) HomePlanet NULLs: 9 4. (after replacement through the Destination): 0 5. (after replacement by the most common value) HomePlanet NULLs: 0 ------ Destination_update ------------ Destination NULLs: 92 (after replacement by the most common value) Destination NULLs: 0 ------ LastName_update ------------ LastName NULLs: 94 (3012, 2) (after update) LastName NULLs: 51 namesakes_num_in_group NULLs: 51 namesakes_num_in_group NULLs: 0 ------ CabinDeck_update ------------ CabinDeck NULLs: 100 (after update throw GroupId) CabinDeck NULLs: 63 (after update throw HomePlanet) CabinDeck NULLs: 0 ------ CabinNum_update ------------ CabinNum NULLs: 100 ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'T'] A (1, 1) B (11, 1) C (6, 1) D (1, 1) E (2, 1) F (25, 1) G (54, 1) CabinNum NULLs: 0 ------ CabinSide_update ------------ CabinSide NULLs: 100 CabinSide NULLs: 100 CabinSide NULLs: 0 ------ VIP_update ------------ VIP NULLs: 93 VIP NULLs: 0 ------ CryoSleep_update ------------ 93 57 0 ------ Age_update ------------ Nulls in Expenses: 91 Nulls in Expenses: 0 Nulls in Expenses: 0 ------ Expenses_update ------------ Nulls in Expenses: 467 Nulls in Expenses: 426 Nulls in Expenses: 275 Nulls in Expenses: 275 Nulls in Expenses: 0 ------ log_expenses ------------ ------ New_features_update ------------ ------ bool_to_int ------------ ------ memory_optimiz ------------ Reduction = 48.89%
| x0_Earth | x0_Europa | x0_Mars | x1_55 Cancri e | x1_PSO J318.5-22 | x1_TRAPPIST-1e | x2_O | x2_P | x2_S | x3_A | ... | GroupId | NumInGroup | CabinNum | IsChild | TotalSpend | GroupSize | IsSingle | namesakes_num_in_group | NameLength | NoSpend | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.001293 | 0.000000 | 0.003082 | 0.0 | 0.000000 | 0.000000 | 1.0 | 0.000000 | 0.833333 | 1.0 |
| 1 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.001832 | 0.000000 | 0.003610 | 0.0 | 0.258001 | 0.000000 | 1.0 | 0.000000 | 0.777778 | 0.0 |
| 2 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.001940 | 0.000000 | 0.001501 | 0.0 | 0.000000 | 0.000000 | 1.0 | 0.000000 | 0.833333 | 1.0 |
| 3 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.002155 | 0.000000 | 0.002028 | 0.0 | 0.513060 | 0.000000 | 1.0 | 0.000000 | 0.888889 | 0.0 |
| 4 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.002371 | 0.000000 | 0.004137 | 0.0 | 0.222872 | 0.000000 | 1.0 | 0.000000 | 0.833333 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4272 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.998491 | 0.142857 | 0.790178 | 0.0 | 0.000000 | 0.142857 | 0.0 | 0.142857 | 0.611111 | 1.0 |
| 4273 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | ... | 0.998815 | 0.000000 | 0.793238 | 0.0 | 0.428165 | 0.000000 | 1.0 | 0.000000 | 0.722222 | 0.0 |
| 4274 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | ... | 0.999030 | 0.000000 | 0.157549 | 0.0 | 0.000000 | 0.000000 | 1.0 | 0.000000 | 0.611111 | 1.0 |
| 4275 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | ... | 0.999246 | 0.000000 | 0.158076 | 0.0 | 0.356356 | 0.000000 | 1.0 | 0.000000 | 0.777778 | 0.0 |
| 4276 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.999677 | 0.000000 | 0.791232 | 0.0 | 0.000000 | 0.000000 | 1.0 | 0.000000 | 0.888889 | 1.0 |
4277 rows × 44 columns
gb_mod_all = GradientBoostingClassifier(n_estimators=200, learning_rate=0.1, max_depth=3, min_samples_leaf=5)
gb_mod_all.fit(X, y)
cbc_mod_all = CatBoostClassifier(n_estimators=500, learning_rate=0.05, max_depth=5)
cbc_mod_all.fit(X, y)
rf_mod_all = RandomForestClassifier(max_depth=10, min_samples_leaf=1, n_estimators = 250)
rf_mod_all.fit(X, y)
lgbmc_mod_all= LGBMClassifier(learning_rate = 0.05, max_depth = 12, n_estimators = 150)
lgbmc_mod_all.fit(X, y)
xgb_mod_all = XGBClassifier(n_estimators=250, learning_rate=0.1, max_depth=3)
xgb_mod_all.fit(X, y)
y_val_pred_gb_mod = gb_mod_all.predict_proba(validation_df)[:, 1]
y_val_pred_cbc_mod = cbc_mod_all.predict_proba(validation_df)[:, 1]
y_val_pred_lgbmc_mod = lgbmc_mod_all.predict_proba(validation_df)[:, 1]
y_val_pred_rf_mod = rf_mod_all.predict_proba(validation_df)[:, 1]
y_val_pred_xgb_mod = xgb_mod_all.predict_proba(validation_df)[:, 1]
y_val_pred_prob = pd.DataFrame((y_val_pred_gb_mod + y_val_pred_cbc_mod + y_val_pred_lgbmc_mod
+ y_val_pred_rf_mod + y_val_pred_xgb_mod) / 5)
0: learn: 0.6717601 total: 22.6ms remaining: 11.3s 1: learn: 0.6559285 total: 32ms remaining: 7.97s 2: learn: 0.6367266 total: 41.4ms remaining: 6.85s 3: learn: 0.6199487 total: 50.8ms remaining: 6.29s 4: learn: 0.6064402 total: 59ms remaining: 5.84s 5: learn: 0.5930960 total: 63.5ms remaining: 5.22s 6: learn: 0.5824253 total: 67.8ms remaining: 4.78s 7: learn: 0.5713423 total: 72.5ms remaining: 4.46s 8: learn: 0.5628770 total: 77ms remaining: 4.2s 9: learn: 0.5510038 total: 81.5ms remaining: 3.99s 10: learn: 0.5445644 total: 85.4ms remaining: 3.8s 11: learn: 0.5371470 total: 90ms remaining: 3.66s 12: learn: 0.5298268 total: 94.5ms remaining: 3.54s 13: learn: 0.5214459 total: 99.2ms remaining: 3.44s 14: learn: 0.5153465 total: 104ms remaining: 3.35s 15: learn: 0.5113514 total: 108ms remaining: 3.27s 16: learn: 0.5073899 total: 113ms remaining: 3.21s 17: learn: 0.5030747 total: 117ms remaining: 3.14s 18: learn: 0.4994706 total: 122ms remaining: 3.08s 19: learn: 0.4945849 total: 126ms remaining: 3.03s 20: learn: 0.4913744 total: 131ms remaining: 2.99s 21: learn: 0.4887091 total: 136ms remaining: 2.94s 22: learn: 0.4857981 total: 139ms remaining: 2.89s 23: learn: 0.4825017 total: 144ms remaining: 2.86s 24: learn: 0.4786623 total: 149ms remaining: 2.82s 25: learn: 0.4768492 total: 153ms remaining: 2.79s 26: learn: 0.4734889 total: 158ms remaining: 2.76s 27: learn: 0.4710118 total: 162ms remaining: 2.73s 28: learn: 0.4684620 total: 167ms remaining: 2.71s 29: learn: 0.4662380 total: 171ms remaining: 2.68s 30: learn: 0.4646370 total: 176ms remaining: 2.66s 31: learn: 0.4631152 total: 180ms remaining: 2.64s 32: learn: 0.4609342 total: 185ms remaining: 2.62s 33: learn: 0.4592114 total: 189ms remaining: 2.6s 34: learn: 0.4569889 total: 194ms remaining: 2.58s 35: learn: 0.4545138 total: 198ms remaining: 2.56s 36: learn: 0.4531351 total: 203ms remaining: 2.54s 37: learn: 0.4511333 total: 208ms remaining: 2.53s 38: learn: 0.4500232 total: 213ms remaining: 2.52s 39: learn: 0.4485887 total: 218ms remaining: 2.5s 40: learn: 0.4469055 total: 223ms remaining: 2.49s 41: learn: 0.4453079 total: 228ms remaining: 2.48s 42: learn: 0.4442801 total: 232ms remaining: 2.47s 43: learn: 0.4430573 total: 237ms remaining: 2.46s 44: learn: 0.4417083 total: 242ms remaining: 2.44s 45: learn: 0.4399265 total: 247ms remaining: 2.44s 46: learn: 0.4383401 total: 253ms remaining: 2.43s 47: learn: 0.4373329 total: 258ms remaining: 2.42s 48: learn: 0.4364893 total: 262ms remaining: 2.42s 49: learn: 0.4356206 total: 268ms remaining: 2.41s 50: learn: 0.4342541 total: 273ms remaining: 2.4s 51: learn: 0.4335778 total: 278ms remaining: 2.39s 52: learn: 0.4320319 total: 283ms remaining: 2.38s 53: learn: 0.4309256 total: 287ms remaining: 2.37s 54: learn: 0.4299535 total: 291ms remaining: 2.36s 55: learn: 0.4285534 total: 296ms remaining: 2.35s 56: learn: 0.4275038 total: 301ms remaining: 2.33s 57: learn: 0.4263364 total: 305ms remaining: 2.32s 58: learn: 0.4256917 total: 309ms remaining: 2.31s 59: learn: 0.4252728 total: 314ms remaining: 2.3s 60: learn: 0.4249211 total: 318ms remaining: 2.29s 61: learn: 0.4242374 total: 323ms remaining: 2.28s 62: learn: 0.4231913 total: 328ms remaining: 2.27s 63: learn: 0.4224360 total: 332ms remaining: 2.26s 64: learn: 0.4217434 total: 337ms remaining: 2.25s 65: learn: 0.4211012 total: 341ms remaining: 2.24s 66: learn: 0.4201431 total: 345ms remaining: 2.23s 67: learn: 0.4187252 total: 349ms remaining: 2.22s 68: learn: 0.4175828 total: 354ms remaining: 2.21s 69: learn: 0.4167893 total: 359ms remaining: 2.2s 70: learn: 0.4160514 total: 363ms remaining: 2.19s 71: learn: 0.4152627 total: 368ms remaining: 2.19s 72: learn: 0.4145157 total: 372ms remaining: 2.18s 73: learn: 0.4136453 total: 377ms remaining: 2.17s 74: learn: 0.4125564 total: 381ms remaining: 2.16s 75: learn: 0.4116931 total: 386ms remaining: 2.15s 76: learn: 0.4110673 total: 390ms remaining: 2.14s 77: learn: 0.4106763 total: 395ms remaining: 2.13s 78: learn: 0.4099986 total: 400ms remaining: 2.13s 79: learn: 0.4095265 total: 404ms remaining: 2.12s 80: learn: 0.4089138 total: 409ms remaining: 2.12s 81: learn: 0.4084111 total: 414ms remaining: 2.11s 82: learn: 0.4074390 total: 419ms remaining: 2.1s 83: learn: 0.4069154 total: 423ms remaining: 2.1s 84: learn: 0.4064620 total: 428ms remaining: 2.09s 85: learn: 0.4059731 total: 433ms remaining: 2.08s 86: learn: 0.4057280 total: 438ms remaining: 2.08s 87: learn: 0.4052259 total: 443ms remaining: 2.07s 88: learn: 0.4046340 total: 448ms remaining: 2.07s 89: learn: 0.4039782 total: 452ms remaining: 2.06s 90: learn: 0.4032069 total: 457ms remaining: 2.05s 91: learn: 0.4026369 total: 461ms remaining: 2.05s 92: learn: 0.4019732 total: 466ms remaining: 2.04s 93: learn: 0.4015387 total: 471ms remaining: 2.03s 94: learn: 0.4009854 total: 475ms remaining: 2.03s 95: learn: 0.4006115 total: 480ms remaining: 2.02s 96: learn: 0.3994422 total: 485ms remaining: 2.01s 97: learn: 0.3987352 total: 489ms remaining: 2.01s 98: learn: 0.3982851 total: 494ms remaining: 2s 99: learn: 0.3979062 total: 499ms remaining: 1.99s 100: learn: 0.3974923 total: 503ms remaining: 1.99s 101: learn: 0.3972004 total: 508ms remaining: 1.98s 102: learn: 0.3968142 total: 513ms remaining: 1.98s 103: learn: 0.3965742 total: 517ms remaining: 1.97s 104: learn: 0.3963670 total: 522ms remaining: 1.96s 105: learn: 0.3960710 total: 527ms remaining: 1.96s 106: learn: 0.3955506 total: 531ms remaining: 1.95s 107: learn: 0.3949300 total: 536ms remaining: 1.95s 108: learn: 0.3943364 total: 541ms remaining: 1.94s 109: learn: 0.3937671 total: 545ms remaining: 1.93s 110: learn: 0.3935486 total: 550ms remaining: 1.93s 111: learn: 0.3931848 total: 554ms remaining: 1.92s 112: learn: 0.3927611 total: 559ms remaining: 1.91s 113: learn: 0.3924328 total: 563ms remaining: 1.91s 114: learn: 0.3920901 total: 568ms remaining: 1.9s 115: learn: 0.3915730 total: 572ms remaining: 1.89s 116: learn: 0.3912976 total: 577ms remaining: 1.89s 117: learn: 0.3909855 total: 582ms remaining: 1.88s 118: learn: 0.3906130 total: 587ms remaining: 1.88s 119: learn: 0.3903298 total: 592ms remaining: 1.87s 120: learn: 0.3900764 total: 597ms remaining: 1.87s 121: learn: 0.3898187 total: 601ms remaining: 1.86s 122: learn: 0.3894357 total: 606ms remaining: 1.86s 123: learn: 0.3889836 total: 611ms remaining: 1.85s 124: learn: 0.3885785 total: 616ms remaining: 1.85s 125: learn: 0.3882237 total: 621ms remaining: 1.84s 126: learn: 0.3880068 total: 626ms remaining: 1.84s 127: learn: 0.3877897 total: 631ms remaining: 1.83s 128: learn: 0.3875918 total: 636ms remaining: 1.83s 129: learn: 0.3869848 total: 641ms remaining: 1.82s 130: learn: 0.3864431 total: 646ms remaining: 1.82s 131: learn: 0.3862427 total: 650ms remaining: 1.81s 132: learn: 0.3860463 total: 655ms remaining: 1.81s 133: learn: 0.3857634 total: 659ms remaining: 1.8s 134: learn: 0.3855350 total: 664ms remaining: 1.79s 135: learn: 0.3852828 total: 669ms remaining: 1.79s 136: learn: 0.3850812 total: 674ms remaining: 1.79s 137: learn: 0.3848158 total: 680ms remaining: 1.78s 138: learn: 0.3845824 total: 684ms remaining: 1.78s 139: learn: 0.3844295 total: 689ms remaining: 1.77s 140: learn: 0.3838684 total: 693ms remaining: 1.76s 141: learn: 0.3835324 total: 698ms remaining: 1.76s 142: learn: 0.3832876 total: 702ms remaining: 1.75s 143: learn: 0.3830584 total: 707ms remaining: 1.75s 144: learn: 0.3827970 total: 711ms remaining: 1.74s 145: learn: 0.3826102 total: 716ms remaining: 1.74s 146: learn: 0.3824430 total: 720ms remaining: 1.73s 147: learn: 0.3822307 total: 725ms remaining: 1.72s 148: learn: 0.3818330 total: 729ms remaining: 1.72s 149: learn: 0.3816120 total: 734ms remaining: 1.71s 150: learn: 0.3814018 total: 738ms remaining: 1.71s 151: learn: 0.3809961 total: 743ms remaining: 1.7s 152: learn: 0.3802546 total: 747ms remaining: 1.7s 153: learn: 0.3797225 total: 752ms remaining: 1.69s 154: learn: 0.3794987 total: 757ms remaining: 1.68s 155: learn: 0.3791479 total: 762ms remaining: 1.68s 156: learn: 0.3789332 total: 766ms remaining: 1.67s 157: learn: 0.3787225 total: 771ms remaining: 1.67s 158: learn: 0.3785415 total: 776ms remaining: 1.66s 159: learn: 0.3782864 total: 782ms remaining: 1.66s 160: learn: 0.3780342 total: 787ms remaining: 1.66s 161: learn: 0.3778526 total: 791ms remaining: 1.65s 162: learn: 0.3776400 total: 796ms remaining: 1.65s 163: learn: 0.3773102 total: 801ms remaining: 1.64s 164: learn: 0.3769944 total: 806ms remaining: 1.64s 165: learn: 0.3765577 total: 811ms remaining: 1.63s 166: learn: 0.3763370 total: 815ms remaining: 1.63s 167: learn: 0.3759730 total: 820ms remaining: 1.62s 168: learn: 0.3756746 total: 825ms remaining: 1.61s 169: learn: 0.3754188 total: 829ms remaining: 1.61s 170: learn: 0.3752272 total: 834ms remaining: 1.6s 171: learn: 0.3750105 total: 838ms remaining: 1.6s 172: learn: 0.3747266 total: 843ms remaining: 1.59s 173: learn: 0.3743836 total: 847ms remaining: 1.59s 174: learn: 0.3741939 total: 852ms remaining: 1.58s 175: learn: 0.3739720 total: 856ms remaining: 1.58s 176: learn: 0.3737801 total: 861ms remaining: 1.57s 177: learn: 0.3734711 total: 865ms remaining: 1.56s 178: learn: 0.3732889 total: 869ms remaining: 1.56s 179: learn: 0.3726123 total: 874ms remaining: 1.55s 180: learn: 0.3723271 total: 879ms remaining: 1.55s 181: learn: 0.3721449 total: 883ms remaining: 1.54s 182: learn: 0.3718879 total: 888ms remaining: 1.54s 183: learn: 0.3716904 total: 892ms remaining: 1.53s 184: learn: 0.3714175 total: 897ms remaining: 1.53s 185: learn: 0.3711286 total: 901ms remaining: 1.52s 186: learn: 0.3706215 total: 906ms remaining: 1.51s 187: learn: 0.3703787 total: 910ms remaining: 1.51s 188: learn: 0.3700724 total: 915ms remaining: 1.5s 189: learn: 0.3691602 total: 919ms remaining: 1.5s 190: learn: 0.3689259 total: 924ms remaining: 1.49s 191: learn: 0.3684667 total: 928ms remaining: 1.49s 192: learn: 0.3682771 total: 933ms remaining: 1.48s 193: learn: 0.3681055 total: 937ms remaining: 1.48s 194: learn: 0.3678036 total: 942ms remaining: 1.47s 195: learn: 0.3676447 total: 946ms remaining: 1.47s 196: learn: 0.3674059 total: 951ms remaining: 1.46s 197: learn: 0.3671371 total: 956ms remaining: 1.46s 198: learn: 0.3668385 total: 961ms remaining: 1.45s 199: learn: 0.3666176 total: 965ms remaining: 1.45s 200: learn: 0.3664413 total: 970ms remaining: 1.44s 201: learn: 0.3661534 total: 975ms remaining: 1.44s 202: learn: 0.3658395 total: 980ms remaining: 1.43s 203: learn: 0.3653964 total: 984ms remaining: 1.43s 204: learn: 0.3652394 total: 989ms remaining: 1.42s 205: learn: 0.3650271 total: 994ms remaining: 1.42s 206: learn: 0.3648086 total: 998ms remaining: 1.41s 207: learn: 0.3645508 total: 1s remaining: 1.41s 208: learn: 0.3642864 total: 1.01s remaining: 1.4s 209: learn: 0.3639472 total: 1.01s remaining: 1.4s 210: learn: 0.3635741 total: 1.02s remaining: 1.39s 211: learn: 0.3631929 total: 1.02s remaining: 1.39s 212: learn: 0.3627399 total: 1.03s remaining: 1.38s 213: learn: 0.3624739 total: 1.03s remaining: 1.38s 214: learn: 0.3622297 total: 1.03s remaining: 1.37s 215: learn: 0.3620023 total: 1.04s remaining: 1.37s 216: learn: 0.3618196 total: 1.04s remaining: 1.36s 217: learn: 0.3614155 total: 1.05s remaining: 1.36s 218: learn: 0.3609358 total: 1.05s remaining: 1.35s 219: learn: 0.3607155 total: 1.06s remaining: 1.35s 220: learn: 0.3604386 total: 1.06s remaining: 1.34s 221: learn: 0.3601874 total: 1.07s remaining: 1.34s 222: learn: 0.3598694 total: 1.07s remaining: 1.33s 223: learn: 0.3596441 total: 1.08s remaining: 1.32s 224: learn: 0.3594202 total: 1.08s remaining: 1.32s 225: learn: 0.3590983 total: 1.08s remaining: 1.31s 226: learn: 0.3588792 total: 1.09s remaining: 1.31s 227: learn: 0.3586156 total: 1.09s remaining: 1.3s 228: learn: 0.3583252 total: 1.1s remaining: 1.3s 229: learn: 0.3580778 total: 1.1s remaining: 1.29s 230: learn: 0.3577989 total: 1.11s remaining: 1.29s 231: learn: 0.3575176 total: 1.11s remaining: 1.28s 232: learn: 0.3572006 total: 1.11s remaining: 1.28s 233: learn: 0.3568898 total: 1.12s remaining: 1.27s 234: learn: 0.3566845 total: 1.12s remaining: 1.27s 235: learn: 0.3563817 total: 1.13s remaining: 1.26s 236: learn: 0.3560629 total: 1.13s remaining: 1.26s 237: learn: 0.3558409 total: 1.14s remaining: 1.25s 238: learn: 0.3555551 total: 1.14s remaining: 1.25s 239: learn: 0.3552568 total: 1.15s remaining: 1.24s 240: learn: 0.3548932 total: 1.15s remaining: 1.24s 241: learn: 0.3546428 total: 1.16s remaining: 1.23s 242: learn: 0.3544089 total: 1.16s remaining: 1.23s 243: learn: 0.3540873 total: 1.17s remaining: 1.22s 244: learn: 0.3538193 total: 1.17s remaining: 1.22s 245: learn: 0.3534859 total: 1.18s remaining: 1.21s 246: learn: 0.3530990 total: 1.18s remaining: 1.21s 247: learn: 0.3528541 total: 1.19s remaining: 1.2s 248: learn: 0.3524647 total: 1.19s remaining: 1.2s 249: learn: 0.3522479 total: 1.2s remaining: 1.2s 250: learn: 0.3520987 total: 1.2s remaining: 1.19s 251: learn: 0.3518770 total: 1.2s remaining: 1.18s 252: learn: 0.3516042 total: 1.21s remaining: 1.18s 253: learn: 0.3514098 total: 1.21s remaining: 1.17s 254: learn: 0.3511992 total: 1.22s remaining: 1.17s 255: learn: 0.3509905 total: 1.22s remaining: 1.16s 256: learn: 0.3507944 total: 1.23s remaining: 1.16s 257: learn: 0.3505689 total: 1.23s remaining: 1.15s 258: learn: 0.3503269 total: 1.24s remaining: 1.15s 259: learn: 0.3501707 total: 1.24s remaining: 1.14s 260: learn: 0.3499277 total: 1.24s remaining: 1.14s 261: learn: 0.3493997 total: 1.25s remaining: 1.13s 262: learn: 0.3491493 total: 1.25s remaining: 1.13s 263: learn: 0.3489810 total: 1.26s remaining: 1.12s 264: learn: 0.3488104 total: 1.26s remaining: 1.12s 265: learn: 0.3485737 total: 1.27s remaining: 1.11s 266: learn: 0.3483035 total: 1.27s remaining: 1.11s 267: learn: 0.3481858 total: 1.27s remaining: 1.1s 268: learn: 0.3479382 total: 1.28s remaining: 1.1s 269: learn: 0.3477118 total: 1.28s remaining: 1.09s 270: learn: 0.3474673 total: 1.29s remaining: 1.09s 271: learn: 0.3472235 total: 1.29s remaining: 1.08s 272: learn: 0.3470084 total: 1.3s remaining: 1.08s 273: learn: 0.3468528 total: 1.3s remaining: 1.07s 274: learn: 0.3466809 total: 1.31s remaining: 1.07s 275: learn: 0.3464293 total: 1.31s remaining: 1.06s 276: learn: 0.3461370 total: 1.31s remaining: 1.06s 277: learn: 0.3459567 total: 1.32s remaining: 1.05s 278: learn: 0.3457771 total: 1.32s remaining: 1.05s 279: learn: 0.3455924 total: 1.33s remaining: 1.04s 280: learn: 0.3453867 total: 1.33s remaining: 1.04s 281: learn: 0.3450915 total: 1.34s remaining: 1.03s 282: learn: 0.3448678 total: 1.34s remaining: 1.03s 283: learn: 0.3445951 total: 1.35s remaining: 1.03s 284: learn: 0.3444689 total: 1.35s remaining: 1.02s 285: learn: 0.3441973 total: 1.36s remaining: 1.02s 286: learn: 0.3439830 total: 1.36s remaining: 1.01s 287: learn: 0.3438204 total: 1.37s remaining: 1.01s 288: learn: 0.3436561 total: 1.37s remaining: 1s 289: learn: 0.3434784 total: 1.38s remaining: 998ms 290: learn: 0.3432609 total: 1.38s remaining: 993ms 291: learn: 0.3430424 total: 1.39s remaining: 988ms 292: learn: 0.3428841 total: 1.39s remaining: 983ms 293: learn: 0.3425600 total: 1.4s remaining: 979ms 294: learn: 0.3424239 total: 1.4s remaining: 974ms 295: learn: 0.3422246 total: 1.41s remaining: 969ms 296: learn: 0.3420495 total: 1.41s remaining: 964ms 297: learn: 0.3417844 total: 1.41s remaining: 959ms 298: learn: 0.3416426 total: 1.42s remaining: 954ms 299: learn: 0.3413840 total: 1.42s remaining: 949ms 300: learn: 0.3411010 total: 1.43s remaining: 944ms 301: learn: 0.3407767 total: 1.43s remaining: 939ms 302: learn: 0.3405628 total: 1.44s remaining: 934ms 303: learn: 0.3403627 total: 1.44s remaining: 929ms 304: learn: 0.3401680 total: 1.45s remaining: 925ms 305: learn: 0.3400366 total: 1.45s remaining: 920ms 306: learn: 0.3398456 total: 1.46s remaining: 915ms 307: learn: 0.3396233 total: 1.46s remaining: 910ms 308: learn: 0.3393664 total: 1.46s remaining: 905ms 309: learn: 0.3390886 total: 1.47s remaining: 900ms 310: learn: 0.3389059 total: 1.47s remaining: 895ms 311: learn: 0.3386825 total: 1.48s remaining: 890ms 312: learn: 0.3384965 total: 1.48s remaining: 885ms 313: learn: 0.3382500 total: 1.49s remaining: 881ms 314: learn: 0.3381051 total: 1.49s remaining: 876ms 315: learn: 0.3378937 total: 1.5s remaining: 871ms 316: learn: 0.3376916 total: 1.5s remaining: 866ms 317: learn: 0.3375115 total: 1.5s remaining: 861ms 318: learn: 0.3373210 total: 1.51s remaining: 856ms 319: learn: 0.3371033 total: 1.51s remaining: 852ms 320: learn: 0.3369182 total: 1.52s remaining: 847ms 321: learn: 0.3367748 total: 1.52s remaining: 842ms 322: learn: 0.3366565 total: 1.53s remaining: 837ms 323: learn: 0.3365268 total: 1.53s remaining: 833ms 324: learn: 0.3364213 total: 1.54s remaining: 828ms 325: learn: 0.3362802 total: 1.54s remaining: 823ms 326: learn: 0.3359843 total: 1.55s remaining: 818ms 327: learn: 0.3357722 total: 1.55s remaining: 814ms 328: learn: 0.3355504 total: 1.56s remaining: 809ms 329: learn: 0.3353803 total: 1.56s remaining: 804ms 330: learn: 0.3352275 total: 1.56s remaining: 799ms 331: learn: 0.3350715 total: 1.57s remaining: 795ms 332: learn: 0.3348467 total: 1.57s remaining: 790ms 333: learn: 0.3346739 total: 1.58s remaining: 785ms 334: learn: 0.3344955 total: 1.58s remaining: 780ms 335: learn: 0.3343182 total: 1.59s remaining: 775ms 336: learn: 0.3341957 total: 1.59s remaining: 770ms 337: learn: 0.3339743 total: 1.6s remaining: 766ms 338: learn: 0.3337593 total: 1.6s remaining: 761ms 339: learn: 0.3336233 total: 1.61s remaining: 756ms 340: learn: 0.3334741 total: 1.61s remaining: 751ms 341: learn: 0.3332315 total: 1.61s remaining: 746ms 342: learn: 0.3330434 total: 1.62s remaining: 741ms 343: learn: 0.3328270 total: 1.62s remaining: 736ms 344: learn: 0.3327191 total: 1.63s remaining: 732ms 345: learn: 0.3324520 total: 1.63s remaining: 727ms 346: learn: 0.3322861 total: 1.64s remaining: 722ms 347: learn: 0.3321402 total: 1.64s remaining: 717ms 348: learn: 0.3318896 total: 1.65s remaining: 712ms 349: learn: 0.3317607 total: 1.65s remaining: 708ms 350: learn: 0.3315616 total: 1.66s remaining: 703ms 351: learn: 0.3314236 total: 1.66s remaining: 698ms 352: learn: 0.3312920 total: 1.66s remaining: 693ms 353: learn: 0.3310145 total: 1.67s remaining: 688ms 354: learn: 0.3307983 total: 1.67s remaining: 683ms 355: learn: 0.3306109 total: 1.68s remaining: 679ms 356: learn: 0.3303871 total: 1.68s remaining: 674ms 357: learn: 0.3302206 total: 1.69s remaining: 669ms 358: learn: 0.3300003 total: 1.69s remaining: 664ms 359: learn: 0.3298057 total: 1.7s remaining: 660ms 360: learn: 0.3296643 total: 1.7s remaining: 655ms 361: learn: 0.3294424 total: 1.71s remaining: 650ms 362: learn: 0.3292544 total: 1.71s remaining: 645ms 363: learn: 0.3290123 total: 1.71s remaining: 641ms 364: learn: 0.3287291 total: 1.72s remaining: 636ms 365: learn: 0.3285776 total: 1.72s remaining: 631ms 366: learn: 0.3284752 total: 1.73s remaining: 627ms 367: learn: 0.3282392 total: 1.73s remaining: 622ms 368: learn: 0.3280258 total: 1.74s remaining: 617ms 369: learn: 0.3278825 total: 1.74s remaining: 612ms 370: learn: 0.3276901 total: 1.75s remaining: 608ms 371: learn: 0.3275092 total: 1.75s remaining: 603ms 372: learn: 0.3273203 total: 1.76s remaining: 598ms 373: learn: 0.3271396 total: 1.76s remaining: 593ms 374: learn: 0.3269237 total: 1.76s remaining: 589ms 375: learn: 0.3267089 total: 1.77s remaining: 584ms 376: learn: 0.3265159 total: 1.77s remaining: 579ms 377: learn: 0.3263222 total: 1.78s remaining: 574ms 378: learn: 0.3261838 total: 1.78s remaining: 570ms 379: learn: 0.3260458 total: 1.79s remaining: 565ms 380: learn: 0.3258971 total: 1.79s remaining: 560ms 381: learn: 0.3256985 total: 1.8s remaining: 555ms 382: learn: 0.3255459 total: 1.8s remaining: 551ms 383: learn: 0.3253767 total: 1.81s remaining: 546ms 384: learn: 0.3252537 total: 1.81s remaining: 541ms 385: learn: 0.3250408 total: 1.82s remaining: 536ms 386: learn: 0.3248237 total: 1.82s remaining: 532ms 387: learn: 0.3246429 total: 1.82s remaining: 527ms 388: learn: 0.3245438 total: 1.83s remaining: 522ms 389: learn: 0.3244019 total: 1.83s remaining: 517ms 390: learn: 0.3242587 total: 1.84s remaining: 513ms 391: learn: 0.3241970 total: 1.84s remaining: 508ms 392: learn: 0.3240150 total: 1.85s remaining: 503ms 393: learn: 0.3238193 total: 1.85s remaining: 498ms 394: learn: 0.3236717 total: 1.86s remaining: 494ms 395: learn: 0.3234936 total: 1.86s remaining: 489ms 396: learn: 0.3232674 total: 1.87s remaining: 484ms 397: learn: 0.3230962 total: 1.87s remaining: 480ms 398: learn: 0.3229725 total: 1.88s remaining: 475ms 399: learn: 0.3228008 total: 1.88s remaining: 470ms 400: learn: 0.3226170 total: 1.89s remaining: 466ms 401: learn: 0.3224791 total: 1.89s remaining: 461ms 402: learn: 0.3222931 total: 1.9s remaining: 456ms 403: learn: 0.3220041 total: 1.9s remaining: 451ms 404: learn: 0.3218080 total: 1.9s remaining: 447ms 405: learn: 0.3215786 total: 1.91s remaining: 442ms 406: learn: 0.3213293 total: 1.91s remaining: 438ms 407: learn: 0.3211666 total: 1.92s remaining: 433ms 408: learn: 0.3209909 total: 1.92s remaining: 428ms 409: learn: 0.3209621 total: 1.93s remaining: 424ms 410: learn: 0.3208176 total: 1.93s remaining: 419ms 411: learn: 0.3207185 total: 1.94s remaining: 414ms 412: learn: 0.3205567 total: 1.94s remaining: 410ms 413: learn: 0.3204339 total: 1.95s remaining: 405ms 414: learn: 0.3202224 total: 1.95s remaining: 400ms 415: learn: 0.3201699 total: 1.96s remaining: 395ms 416: learn: 0.3200657 total: 1.96s remaining: 391ms 417: learn: 0.3199241 total: 1.97s remaining: 386ms 418: learn: 0.3197214 total: 1.97s remaining: 381ms 419: learn: 0.3195637 total: 1.98s remaining: 376ms 420: learn: 0.3194919 total: 1.98s remaining: 372ms 421: learn: 0.3193358 total: 1.99s remaining: 367ms 422: learn: 0.3191969 total: 1.99s remaining: 362ms 423: learn: 0.3190577 total: 1.99s remaining: 358ms 424: learn: 0.3189382 total: 2s remaining: 353ms 425: learn: 0.3187837 total: 2s remaining: 348ms 426: learn: 0.3186229 total: 2.01s remaining: 343ms 427: learn: 0.3184402 total: 2.01s remaining: 339ms 428: learn: 0.3181934 total: 2.02s remaining: 334ms 429: learn: 0.3179372 total: 2.02s remaining: 329ms 430: learn: 0.3177930 total: 2.03s remaining: 324ms 431: learn: 0.3176247 total: 2.03s remaining: 320ms 432: learn: 0.3174426 total: 2.04s remaining: 315ms 433: learn: 0.3172850 total: 2.04s remaining: 310ms 434: learn: 0.3171810 total: 2.04s remaining: 306ms 435: learn: 0.3169647 total: 2.05s remaining: 301ms 436: learn: 0.3167284 total: 2.05s remaining: 296ms 437: learn: 0.3166068 total: 2.06s remaining: 291ms 438: learn: 0.3164993 total: 2.06s remaining: 287ms 439: learn: 0.3163399 total: 2.07s remaining: 282ms 440: learn: 0.3162061 total: 2.07s remaining: 277ms 441: learn: 0.3160192 total: 2.08s remaining: 273ms 442: learn: 0.3158436 total: 2.08s remaining: 268ms 443: learn: 0.3156906 total: 2.09s remaining: 263ms 444: learn: 0.3155182 total: 2.09s remaining: 259ms 445: learn: 0.3154327 total: 2.1s remaining: 254ms 446: learn: 0.3152776 total: 2.1s remaining: 249ms 447: learn: 0.3151075 total: 2.11s remaining: 244ms 448: learn: 0.3150200 total: 2.11s remaining: 240ms 449: learn: 0.3148317 total: 2.12s remaining: 235ms 450: learn: 0.3146968 total: 2.12s remaining: 230ms 451: learn: 0.3145142 total: 2.12s remaining: 226ms 452: learn: 0.3142551 total: 2.13s remaining: 221ms 453: learn: 0.3141559 total: 2.13s remaining: 216ms 454: learn: 0.3140553 total: 2.14s remaining: 211ms 455: learn: 0.3139161 total: 2.14s remaining: 207ms 456: learn: 0.3137254 total: 2.15s remaining: 202ms 457: learn: 0.3135849 total: 2.15s remaining: 197ms 458: learn: 0.3134253 total: 2.16s remaining: 193ms 459: learn: 0.3132513 total: 2.16s remaining: 188ms 460: learn: 0.3131640 total: 2.17s remaining: 183ms 461: learn: 0.3130146 total: 2.17s remaining: 178ms 462: learn: 0.3129096 total: 2.17s remaining: 174ms 463: learn: 0.3128294 total: 2.18s remaining: 169ms 464: learn: 0.3126863 total: 2.18s remaining: 164ms 465: learn: 0.3124912 total: 2.19s remaining: 160ms 466: learn: 0.3123493 total: 2.19s remaining: 155ms 467: learn: 0.3121929 total: 2.2s remaining: 150ms 468: learn: 0.3119701 total: 2.2s remaining: 146ms 469: learn: 0.3118240 total: 2.21s remaining: 141ms 470: learn: 0.3116671 total: 2.21s remaining: 136ms 471: learn: 0.3116504 total: 2.21s remaining: 131ms 472: learn: 0.3114849 total: 2.22s remaining: 127ms 473: learn: 0.3113597 total: 2.23s remaining: 122ms 474: learn: 0.3111754 total: 2.23s remaining: 117ms 475: learn: 0.3110146 total: 2.23s remaining: 113ms 476: learn: 0.3108544 total: 2.24s remaining: 108ms 477: learn: 0.3107609 total: 2.24s remaining: 103ms 478: learn: 0.3105628 total: 2.25s remaining: 98.6ms 479: learn: 0.3104318 total: 2.25s remaining: 93.9ms 480: learn: 0.3102733 total: 2.26s remaining: 89.2ms 481: learn: 0.3100673 total: 2.26s remaining: 84.5ms 482: learn: 0.3099569 total: 2.27s remaining: 79.8ms 483: learn: 0.3097728 total: 2.27s remaining: 75.1ms 484: learn: 0.3096936 total: 2.28s remaining: 70.4ms 485: learn: 0.3095276 total: 2.28s remaining: 65.7ms 486: learn: 0.3093053 total: 2.29s remaining: 61ms 487: learn: 0.3092215 total: 2.29s remaining: 56.4ms 488: learn: 0.3090390 total: 2.3s remaining: 51.7ms 489: learn: 0.3088935 total: 2.3s remaining: 47ms 490: learn: 0.3087968 total: 2.31s remaining: 42.3ms 491: learn: 0.3086630 total: 2.31s remaining: 37.6ms 492: learn: 0.3085718 total: 2.31s remaining: 32.9ms 493: learn: 0.3084535 total: 2.32s remaining: 28.2ms 494: learn: 0.3082034 total: 2.33s remaining: 23.5ms 495: learn: 0.3081250 total: 2.33s remaining: 18.8ms 496: learn: 0.3079642 total: 2.33s remaining: 14.1ms 497: learn: 0.3078798 total: 2.34s remaining: 9.4ms 498: learn: 0.3077689 total: 2.34s remaining: 4.7ms 499: learn: 0.3076681 total: 2.35s remaining: 0us
# ------------------ y_val_pred_prob = pd.DataFrame(cbc_mod_all.predict_proba(validation_df)[:, 1])
# y_val_pred = lgbmc_mod_all.predict(validation_df_pca)
# y_val_pred = lgbmc_mod_all.predict(validation_df)
#########y_val_pred = y_val_pred_prob.applymap(lambda x: x > 0.47419098173462404)
# y_val_pred = y_val_pred.astype(bool)
# y_val_pred
fig, axs = plt.subplots(figsize=(12, 6))
sns.histplot( x=y_val_pred_prob.loc[:, 0], bins=40, kde=True)
plt.xlabel('Predicted Propabilities')
Text(0.5, 0, 'Predicted Propabilities')
# der Prozentsatz der transportierten Passagiere im ursprünglichen Datensatz
transported_prc = y.value_counts(normalize=True)[1]
print('Transported % = ', transported_prc)
thresholds = np.linspace(0, 0.95, num=200, endpoint=False)
res = {}
for t in thresholds:
tmp_y_val_pred = y_val_pred_prob.applymap(lambda x: x > t)
res[t] = abs(tmp_y_val_pred.value_counts(normalize=True)[True] - transported_prc)
best_threshold = min(res, key=res.get)
print('best_threshold =', best_threshold)
Transported % = 0.5036236051995858 best_threshold = 0.5177499999999999
y_val_pred = y_val_pred_prob.applymap(lambda x: x > best_threshold)
passengerId_df['Transported'] = y_val_pred
passengerId_df
| PassengerId | Transported | |
|---|---|---|
| 0 | 0013_01 | True |
| 1 | 0018_01 | False |
| 2 | 0019_01 | True |
| 3 | 0021_01 | True |
| 4 | 0023_01 | True |
| ... | ... | ... |
| 4272 | 9266_02 | True |
| 4273 | 9269_01 | False |
| 4274 | 9271_01 | True |
| 4275 | 9273_01 | True |
| 4276 | 9277_01 | True |
4277 rows × 2 columns
passengerId_df.to_csv('sample_submission.csv', index=False)